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Abstract 

A number of recent results on optimization problems involving submodular functions have made use 
of the multilinear relaxation of the problem. These results hold typically in the value oracle model, 
where the objective function is accessible via a black box returning f{S) for a given S. We present a 
general approach to deriving inapproximability results in the value oracle model, based on the notion of 
symmetry gap. Our main result is that for any fixed instance that exhibits a certain symmetry gap in 
its multilinear relaxation, there is a naturally related class of instances for which a better approximation 
factor than the symmetry gap would require exponentially many oracle queries. This unifies several 
known hardness results for submodular maximization, e.g. the optimality of (1 — l/e)-approximation 
for monotone submodular maximization under a cardinality constraint, and the impossibility of (i + e)- 
approximation for unconstrained (non-monotone) submodular maximization. 

As a new application, we consider the problem of maximizing a non-monotone submodular function 
over the bases of a matroid. A (i — o(l))-approximation has been developed for this problem, assuming 
that the matroid contains two disjoint bases. We show that the best approximation one can achieve 
is indeed related to packings of bases in the matroid. Specifically, for any fc > 2, there is a class of 
matroids of fractional base packing number v = jTrii such that any algorithm achieving a better than 
(1 — i)-approximation for this class would require exponentially many value queries. In particular, 
there is no constant-factor approximation for maximizing a non-monotone submodular function over the 
bases of a general matroid. On the positive side, we present a i(l — ^ — o(l))-approximation algorithm 
assuming fractional base packing number at least v where i/ £ (1,2]. We also present an improved 
0.309-approximation for maximization of a non-monotone submodular function subject to a matroid 
independence constraint, improving the previously known factor of i — e. For this problem, we obtain a 
hardness of (i -|- e)-approximation for any fixed e > 0. 

Keywords: submodular functions, approximation algorithms, query complexity. 

1 Introduction 

Submodular set functions are defined by the following condition for all pairs of sets S, T: 

f{SUT) + f{SnT)<f{S) + f{T), 

or equivalently by the property that the marginal value of any element, /s(j) = f{S + j) — f{S), satisfies 
ItU) < fsij), whenever j ^ T D S. In addition, a set function is called monotone if f{S) < f{T) whenever 
S Q T. Throughout this paper, we assume that f{S) is nonnegative. 

Submodular functions have been studied in the context of combinatorial optimization since the 1970's, 
especially in connection with matroids [71 |S1 UHl IHl 1221 1301 1311 ESI HI]- Submodular functions appear 
mostly for the following two reasons: (i) submodularity arises naturally in various combinatorial settings, 
and many algorithmic applications use it either explicitly or implicitly; (ii) submodularity has a natural 
interpretation as the property of diminishing returns, which defines an important class of utility /valuation 

*An extended abstract appeared in IEEE FOCS 2009. This work was done while the author was at Princeton University. 
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functions. Submodularity as an abstract concept is both general enough to be useful for applications and 
it carries enough structure to allow strong positive results. A fundamental algorithmic result is that any 
submodular function given by a value oracle can be minimized in strongly polynomial time 124] . 

In contrast to submodular minimization, submodular maximization problems are typically hard to solve 
exactly. Consider the classical problem of maximizing a monotone submodular function subject to a car- 
dinality constraint, max{/(S') : \S\ < k}. It is known that this problem admits a (1 — l/e)-approximation 
PO] and this is optimal for two different reasons: (i) Given only black-box access to f{S), we cannot achieve 
a better approximation, unless we ask exponentially many value queries |22j . This holds even if we allow 
unlimited computational power, (ii) In certain special cases where f{S) has a compact representation on 
the input, it is NP-hard to achieve an approximation better than 1 — 1/e The reason why the hardness 
threshold is the same in both cases is apparently not well understood. 

An optimal (1 — l/e)-approximationfor the problem max{/(S') : l^l < k} where / is monotone sumodular 
is achieved by a simple greedy algorithm [50]. This seems to be rather coincidental; for other variants of 
submodular maximization, such as unconstrained (non-monotone) submodular maximization |10) , monotone 
submodular maximization subject to a matroid constraint |21[ |31 [27] , or submodular maximization subject 
to linear constraints [151 116| . greedy algorithms achieve suboptimal results. A tool which has proven useful 
in approaching these problems is the multilinear relaxation. 

Multilinear relaxation. Let us consider a discrete optimization problem max{/(S') : S S T}, where 
/ : 2^^' — >■ R is the objective function and C 2'''- is the collection of feasible solutions. In case / is a 
linear function, f{S) — X^jeS^i' natural to replace this problem by a linear programming problem. 
For a general set function /(5'), however, a linear relaxation is not readily available. Instead, the following 
relaxation has been proposed [S] [571 H]: For x G [0, l]'^, let x denote a random vector in {0, 1}"^ where each 



This is the unique multilinear polynomial which coincides with / on {0, l}-vectors. We remark that although 
we cannot compute the exact value of i^(x) for a given x S [0, 1]^ (which would require querying 2" possible 
values of f{S)), we can compute i^(x) approximately, by random sampling. Sometimes this causes technical 
issues, which we also deal with in this paper. 

Instead of the discrete problem max{/(S') : S € T}, we consider a continuous optimization problem 
max{i^(x) : x e P(J^)}, where P{J^) is the convex hull of characteristic vectors corresponding to J^, 



The reason why the extension i^(x) = E[/(x)] is useful for submodular maximization problems is that -F(x) 
has convexity properties that allow one to solve the continuous problem max{_F(x) : x G P} (within a constant 
factor) in a number of interesting cases. Moreover, fractional solutions can be often rounded to discrete ones 
without losing anything in terms of the objective value. Then, our ability to solve the multilinear relaxation 
approximately translates directly into an approximation algorithm for the original problem. In particular, 
this is true when the collection of feasible solutions forms a matroid. 

Pipage rounding was originally developed by Ageev and Sviridenko for rounding solutions in the bipartite 
matching polytope [2]. The technique was adapted to matroid polytopes by Calinescu et al. [5, who 
proved that for any submodular function f{S) and any x in the matroid base polytope B{M), the fractional 
solution X can be rounded to a base B G B such that f{B) > F(x). This approach leads to an optimal 
(1 — l/e)-approximation for the Submodular Welfare Problem, and more generally for monotone submodular 
maximization subject to a matroid constraint [3] I27| . It is also known that the factor of 1 — 1/e is optimal for 
the Submodular Welfare Problem both in the NP framework [14] and in the value oracle model [19] . Under 
the assumption that the submodular function f{S) has curvature c, there is a -^(1 — e^'^)-approximation and 
this is also optimal in the value oracle model [5S]. The framework of pipage rounding can be also extended 
to nonmonotone submodular functions; this presents some additional issues which we discuss in this paper. 

^Wc denote vectors consistently in boldface (x) and their coordinates in italics (xi). 




sex ies j<^s 




2 



For the problem of unconstrained (non- monotone) submodular maximization, a 2/5-approximation was 
developed in [TU]. This algorithm implicitly uses the multilinear relaxation max{F(x) : x €E [O,!]'^}. For 
symmetric submodular functions, it is shown in [10 that a uniformly random solution x — (1/2, . . . , 1/2) 
gives -F(x) > ^OPT, and there is no better approximation algorithm in the value oracle model. 

Using additional techniques, the multilinear relaxation can be also applied to submodular maximization 
with knapsack constraints (X^jes ^ij — !)■ For the problem of maximizing a monotone submodular function 
subject to a constant number of knapsack constraints, there is a (1 — 1/e — e)-approximation algorithm for any 
e > |15) . For maximizing a non- monotone submodular function subject to a constant number of knapsack 
constraints, a (1/5 — e)-approximation was designed in [T6] . 

One should mention that not all the best known results for submodular maximization have been achieved 
using the multilinear relaxation. The greedy algorithm yields a l/(fc + l)-approximation for monotone 
submodular maximization subject to k matroid constraints |21| . Local search methods have been used to 
improve this to a l/(fc + e)-approximation, and to obtain a l/(fc + l + l/(fc — 1) + e)-approximation for 
the same problem with a non-monotone submodular function, for any e > |16[ 117) . For the problem of 
maximizing a non-monotone submodular function over the bases of a given matroid, local search yields a 
(1/6 — e)-approximation, assuming that the matroid contains two disjoint bases [16] . 

1.1 Our results 

Our main contribution (Theorem[3|) is a general hardness construction that yields inapproximability results in 
the value oracle model in an automated way, based on what we call the symmetry gap for some fixed instance. 
In this generic fashion, we are able to replicate a number of previously known hardness results (such as the 
optimality of the factors 1 — 1/e and 1/2 mentioned above), and we also produce new hardness results using 
this construction (Theorem [2|) • Our construction helps explain the particular hardness thresholds obtained 
under various constraints, by exhibiting a small instance where the threshold can be seen as the gap between 
the optimal solution and the best symmetric solution. The query complexity results in [TOj [TU [28] can be 
seen in hindsight as special cases of Theorem [3l but the construction in this paper is somewhat different and 
technically more involved than the previous proofs for particular cases. 

Concrete results. Before we proceed to describe our general hardness result, we present its implications 
for two more concrete problems. We also provide closely matching approximation algorithms for these two 
problems, based on the multilinear relaxation. First, we discuss the problem of maximizing a (non-monotone) 
submodular function subject to a matroid independence constraint. In the following, we assume that the 
objective function is given by a value oracle and the matroid is given by a membership oracle. 

Theorem 1. There is a randomized ■!(— 1 + a/S — o(l)) = 0.309 -approximation for the problem max{/(S') : 
S G I}, where f{S) is a nonnegative submodular function, and T is a collection of independent sets in a 
matroid. 

For any e > 0, a (^ + e)- approximation for this problem (e.g. when I = {I : \I\ < ^}) would require 
exponentially many value queries. 

Our algorithmic result improves a previously known (i — o(l))-approximation [16| . The hardness threshold 
follows from our general result, but also quite easily from [TOj . 

Secondly, we consider the problem of maximizing a (non-monotone) submodular function subject to a 
matroid base constraint. (This generalizes for example the maximum bisection problem in graphs.) We show 
that the approximability of this problem is related to base packings in the matroid. We use the following 
definition. 

Definition 1. For a matroid A4 with a collection of bases B, the fractional base packing number is the 
maximum possible value of X^seB '^B for cub >0 such that ^BeB jeB Q^B ^ 1 for every element j . 

Theorem 2. For any v £ (1,2], there is a randomized i(l ^ ^ — o[l))- approximation for the problem 
max{/(5') : S G B}, where f{S) is a nonnegative submodular function, and B is a collection of bases in a 
matroid with fractional packing number at least v. 

On the other hand, for any v in the form v = ^ ^ 2, and any fixed e > 0, a (1 — ^ + e)- approximation 
for the same problem would require exponentially many value queries. 
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In case the matroid contains two disjoint bases (j^ = 2), we obtain a — o(l))-approxiniation, improving 
the previously known factor of ^ — o(l) [16j . In the range of € (1, 2], our positive and negative results are 
within a factor of 2. For maximizing a submodular function over the bases of a general matroid, we obtain 
the following. 

Corollary 1. For the problem max{/(S') : S G B}, where ,f{S) is a nonnegative submodular function, and 
B is a collection of bases in a matroid, any constant-factor approximation requires an exponential number of 
value queries. 

Hardness from the symmetry gap. Now we describe our general hardness result. Consider an instance 
max{/(S') : S G F} which exhibits a certain degree of symmetry. This is formalized by the notion of 
a symmetry group Q. We consider permutations a £ S(X) where S(X) is the symmetric group (of all 
permutations) on the ground set X. We also use a for the naturally induced mapping of subsets of X. 
We say that the instance is invariant under Q C S(X), if for any a £ G and any S C X, f{S) = /(cr(S')) 
and S £ F <^ f £ F. We emphasize that even though we apply a to sets, it must be derived from a 
permutation on X. For x G [0, 1]"^, we define the "symmetrization of x" as 

x = E„eg[(T(x)], 

where a £ Q is uniformly random and cr(x) denotes x with coordinates permuted by a. 

Erratum: The main hardness result in the conference version of this paper [29J was formulated for an 
arbitrary feasibility constraint F, invariant under Q. Unfortunately, this was an error and the theorem does 
not hold in that form — an algorithm could gather some information from querying F, and this possibility 
was neglected in the proof. We need to impose a stronger symmetry constraint on F, namely that the 
condition S £ F depends only on the symmetrized version of S, Is — 'Ea-£g[lcr(s)]- This is the case in all 
the applications of the hardness theorem in [29] and [23] and hence these applications are not affected. 

Definition 2. We call an instance max{/(S') : S £ F} on a ground set X totally symmetric with respect to 
a group of permutations Q on X , if f{S) = f{a{S)) for all S ^ X and a £ Q, and S £ F <^ S' £ F whenever 
Ecreg[lCT(s)] = Eo.eg[lcr(5')]. 

Then, we define the symmetry gap as the ratio between the optimal solution of max{F(x) : x G P{F)} 
and the best symmetric solution of this problem. 

Definition 3 (Symmetry gap). Let max{/(S') : S £ F} be an instance on a ground set X, which is totally 
symmetric with respect to Q C S(X). Let -F(x) = E[/(x)] be the multilinear extension of f{S) and P{F) = 
conv({lj : I £ F}) the polytope associated with F. Let x — Eo-gc;[cr(x)]. The symmetry gap o/max{/(S') : 
S £F} IS defined as -f ^ OPT /OPT where 

OPT = max{F(x) : x G P{F)}, 

OPT = max{F(x) : x G P(J")}. 

Next, we need to define the notion of a refinement of an instance. This is a natural way to extend a family 
of feasible sets to a larger ground set. In particular, this operation preserves the types of constraints that we 
care about, such as cardinality constraints, matroid independence, and matroid base constraints. 

Definition 4 (Refinement). Let F C 2-^ , \X\ ~ k and \N\ — n. We say that F C 2^^-^ is a refinement of 
J", 

F = ^^SCNxX I {xi,...,Xk) £ P{F) where Xj ^ ^l-^n (iV x {j})|| . 

In other words, in the refined instance, each element j £ X is replaced by a set N x {j}. We call this set 
the cluster of elements corresponding to j. A set 5* is in if and only if the fractions Xj of the respective 
clusters that are intersected by 5* form a vector x G P{F), i.e. a convex combination of sets in F. 

Our main result is that the symmetry gap translates automatically into hardness of approximation for 
refined instances. We emphasize that this is a query-complexity lower bound, and hence independent of 
assumptions such as P 7^ NP. 
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Theorem 3. Let max{/(S') : S G T} be an instance of nonnegative (optionally monotone) suhmodular 
maximization, totally symmetric with respect to Q , with symmetry gap 7 — OPT /OPT . Let C he the class 
of instances max{/(S') : S € J^} where f is nonnegative (optionally monotone) suhmodular and J- is a 
refinement of !F . Then for every e > 0, any (even randomized) + approximation algorithm for the class 
C would require exponentially many value queries to f{S). 

We remark that the resuh holds even if the class C is restricted to instances which are themselves symmetric 
under a group related to Q (see the discussion in Section |31 after the proofs of Theorem [3] and 0]) . On the 
algorithmic side, suhmodular maximization seems easier for symmetric instances and in this case we obtain 
optimal approximation factors, up to lower-order terms (see Section [5]) • 

Our hardness construction yields impossibility results also for solving the continuous problem max{_F(x) : 
X e P{J^)}. In the case of matroid constraints, this is easy to see, because an approximation to the continuous 
problem gives the same approximation factor for the discrete problem (by pipage rounding, see Appendix [B|) . 
However, this phenomenon is more general and we can show that the value of a symmetry gap translates 
into an inapproximability result for the multilinear optimization problem under any constraint. 

Theorem 4. Let max{/(5) : S G J-} he an instance of nonnegative (optionally monotone) suhmodular 
maximization, totally symmetric with respect to Q , with symmetry gap 7 — OPT /OPT . Let C he the class 
of instances max{/(S') : S G J-} where f is nonnegative (optionally monotone) suhmodular and J- is a 
refinement of J- . Then for every e > 0, any (even randomized) (1 + e)^ -approximation algorithm for the 
multilinear relaxation max{_F(x) : x G P{J-)} of problems in C would require exponentially many value 
queries to f{S). 

Additions to the conference version and follov^r-up work. An extended abstract of this work appeared 
in IEEE FOGS 2009 !29\. As mentioned above, the main theorem in [29] suffers from a technical flaw. This 
does not affect the applications, but the general theorem in [29] is not correct. We provide a corrected version 
of the main theorem with a complete proof (Theorem [3]) and we extend this hardness result to the problem 
of solving the multilinear relaxation (Theorem 2]). 

Subsequently, further work has been done which exploits the symmetry gap concept. In [23] , it has 
been proved using Theorem [3] that maximizing a nonnegative suhmodular function subject to a matroid 
independence constraint cannot be done within a factor better than 0.478. Even in the case of a cardinality 
constraint, max{/(S') : \S\ < k} cannot be approximated within a factor better than 0.491 [23]. In the 
case of a matroid base constraint, assuming that the fractional base packing number is — for some 
fc > 2, there is no (1 — e"^/*"' + e)-approximation in the value oracle model [33], improving the hardness of 
(1 — i + e) = (-^ + e)-approximation from this paper. These applications are not affected by the flaw in [29], 
and they are implied by the corrected version of Theorem |3| here. 

Very recently , it has been proved using the symmetry gap technique that combinatorial auctions with 
suhmodular bidders do not admit any truthful- in-expectation l/m'''-approximation, for some small fixed 
7 > 0. This is the first non-trivial hardness result for truthful-in-expectation mechanisms for combinatorial 
auctions; it separates the classes of monotone suhmodular functions and coverage functions, where a truthful- 
in-expectation (1 — l/e)-approximation is possible [5]. The proof is self-contained and does not formally refer 

to [21]. 

Organization. The rest of the paper is organized as follows. In Section [21 we present applications of 
our main hardness result (Theorem [3]) to concrete cases, in particular we show how it implies the hardness 
statements in Theorem [1] and [2| In Section [3] we present the proofs of Theorem [3| and Theorem [4| In 
Section [H we prove the algorithmic results in Theorem [1] and [2j In Section [5] we discuss the special case 
of symmetric instances. In the Appendix, we present a few basic facts concerning suhmodular functions, an 
extension of pipage rounding to matroid independence polytopes (rather than matroid base polytopes), and 
other technicalities that would hinder the main exposition. 
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2 From symmetry to inapproximability: applications 



Before we get into the proof of Theorem[31 let us show how it can be applied to a number of specific problems. 
Some of these are hardness results that were proved previously by an ad-hoc method. The last application 
is a new one (Theorem [2]) . 

Nonmonotone submodular maximization. Let X — {1, 2} and for any S C X, f{S) = 1 if \S\ = 1, and 

otherwise. Consider the instance niax{/(5) : S C X}. In other words, this is the Max Cut problem on the 
graph K2- This instance exhibits a simple symmetry, the group of all (two) permutations on {1,2}. We get 
OPT = F(1,0) = i^(0, 1) = 1, while OPT = F(l/2,l/2) = 1/2. Hence, the symmetry gap is 1/2. 




Figure 1: Symmetric instance for nonmonotone submodular maximization: Max Cut on the graph K2. The 
white set denotes the optimal solution, while x is the (unique) symmetric solution. 



Since f{S) is nonegative submodular and there is no constraint on 5 C X, this will be the case for any 
refinement of the instance as well. Theorem [3] implies immediately the following: any algorithm achieving 
a (i + e)-approximation for nonnegative (nonmonotone) submodular maximization requires exponentially 
many value queries (which was previously known [lOj). Note that a "trivial instance" implies a non-trivial 
hardness result. This is typically the case in applications of Theorem [3] 

The same symmetry gap holds if we impose some simple constraints: the problems max{/(S') : jS*] < 1} 
and max{/(S') : \S\ = 1} have the same symmetry gap as above. Hence, the hardness threshold of 1/2 
also holds for nonmonotone submodular maximization under cardinality constraints of the type \S\ < n/2, 
or \S\ — n/2. This proves the hardness part of Theorem [TJ This can be derived quite easily from the 
construction of [10 as well. 

Monotone submodular maximization. Let X = [k] and f{S) = min{|S'|,l}. Consider the instance 
max{/(iS') : \S\ < 1}. This instance is invariant under all permutations on [k], the symmetric group S^. Note 
that the instance is totally symmetric (Def. [2]) with respect to S^, since the feasibility constraint \S\ < 1 
depends only on the symmetrized vector I5 = (ilS*!, . . . , -^IS]). We get OPT = -F(l, 0, . . . , 0) = 1, while 
OPT = F(l/fc, l/k, . . . , 1/fc) = 1 - (1 - 1/k)''. 




Figure 2: Symmetric instance for monotone submodular maximization. 

Here, f{S) is monotone submodular and any refinement of is a set system of the type J- = {S : \S\ < £}■ 
Based on our theorem, this implies that any approximation better than 1 — (1— 1/fc)'^ for monotone submodular 
maximization subject to a cardinality constraint would require exponentially many value queries. Since this 
holds for any fixed fc, we get the same hardness hardness result for any j3 > limfe^oo(l ^ (1 ^ 1/^)*^) = 1 — 1/e 
(which was previously known [22]). 



Submodular maximization over matroid bases. Let X = AU B, A ^ {ai, . . . , 0^}, B = {bi, . . . , b^} 
andT = {S : \SnA\ = 1 & |5nB| = fc-1}. We define /(5') = /^(S') where /^(5) = 1 if a, e S* & 6, ^ 5, 

and otherwise. This instance can be viewed as a Maximum Directed Cut problem on a graph of k disjoint 
arcs, under the constraint that exactly one arc tail and k — 1 arc heads should be on the left-hand side (5). 
An optimal solution is for example 5* = {ai, 62, 63, ... , bk}, which gives OPT = 1. The symmetry here is that 
we can apply the same permutation to A and B simultaneously. Again, the feasibility of a set S depends 
only on the symmetrized vector I5: in fact S G ii and only if I5 = (i, . . . , i, 1 — i, . . . , 1 — i). There is a 
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unique symmetric solutionx= and OPT = F(x) = E[/(i)] = E[/,(S)] = i 

(since each arc appears in the directed cut induced by x with probabihty -p-). 



A 



B 



Figure 3: Symmetric instance for submodular maximization over matroid bases. 

The refined instances are instances of (nonmonotone) submodular maximization over the bases of a 
matroid, where the ground set is partitioned into A\J B and we should take a -^-fraction of A and a (1 — ^)- 
fraction of B. (This means that the fractional packing number of bases is = T^-) Our theorem implies 
that for this class of instances, an approximation better than 1/k is impossible - this proves the hardness 
part of Theorem [2] 

Observe that in all the cases mentioned above, the multilinear relaxation is equivalent to the original 
problem, in the sense that a fractional solution can be rounded without any loss in the objective value. This 
implies that the same hardness factors apply to solving the multilinear relaxation of the respective problems. 
In particular, using the last result (for matroid bases), we obtain that the multilinear optimization problem 
max{P(x) : x S P} does not admit a constant factor for nonnegative submodular functions and matroid 
base polytopes. (We remark that a (1 — l/e)-approximation can be achieved for any monotone submodular 
function and any solvable polytope [17].) 

As Theorem [4] shows, this holds more generally - any symmetry gap construction gives an inapproximabil- 
ity result for solving the multilinear optimization problem max{_F(x) : x G P}. This in fact implies limits on 
what hardness results we can possibly hope for using this technique. For instance, we cannot prove using the 
symmetry gap that the monotone submodular maximization problem subject to the intersection of k matroid 
constraints does not admit a constant factor - because we would also prove that the respective multilinear 
relaxation does not admit such an approximation. But we know from [2 7) that a (1 — l/e)-approximation is 
possible for the multilinear problem in this case. 

Hence, the hardness arising from the symmetry gap is related to the difficulty of solving the multilinear 
optimization problem rather than the difficulty of rounding a fractional solution. Thus this technique is 
primarily suited to optimization problems where the multilinear optimization problem captures closely the 
original discrete problem. 

3 From symmetry to inapproximability: proof 

At a high level, our proof resembles the constructions of [TUlIin]. We construct instances based on continuous 
functions P(x), G'(x), whose optima differ by a gap for which we want to prove hardness. Then we show 
that after a certain perturbation, the two instances are very hard to distinguish. This paper generalizes the 
ideas of [lOl [19] and brings two new ingredients. First, we show that the functions P(x),G(x), which are 
"pulled out of the hat" in [ini US] , can be produced in a natural way from the multilinear relaxation of the 
respective problem, using the notion of a symmetry gap. Secondly, the functions P(x),G(x) are perturbed 
in a way that makes them indistinguishable and this forms the main technical part of the proof. In |10j . 
this step is quite simple. In [T^, the perturbation is more complicated, but still relies on properties of the 
functions P(x), G(x) specific to that application. The construction that we present here (Lemma [2]) uses the 
symmetry properties of a fixed instance in a generic fashion. 

First, let us present an outline of our construction. Given an instance max{/(5') : S G J-} exhibiting 
a symmetry gap 7, we consider two smooth submodulaid functions, P(x) and G(x). The first one is the 
multilinear extension P(x) — E[/(x)], while the second one is its symmetrized version G(x) — P(x). We 

■^"Smooth submodularity" means the condition F < for all i, j. 
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modify these functions slightly so that we obtain functions F(x) and G(x) with the following property: For 
any vector x which is close to its symmetrized version x, F(x) = G(x). The functions F(x),G(x) induce 
instances of submodular maximization on the refined ground sets. The way we define discrete instances 
based on i^(x),G(x) is natural, using the following lemma. Essentially, we interpret the fractional variables 
as fractions of clusters in the refined instance. 

Lemma 1. Let F : [0, 1]"^ — > R &e a function with absolutely continuou^ first partial derivatives. Let 
N = [n], n>l, and define f : N x X so that f{S) = F(x) where Xi = ^\S C^ {N x {i})\. Then 

1. Lf > everywhere for each i, then f is monotone. 

2. Lf Q^.g^ < almost everywhere for all i.j, then f is submodular. 

Proof. First, assume > everywhere for all i. This implies that F is non-decreasing in every coordinate, 
i.e. -F(x) < F{y) whenever x < y. This means that f{S) < f{T) whenever S CT. 

Next, assume ^ is absolutely continuous for each i and g^.g^. < almost everywhere for all i,j. We want 

to prove that |x > §§- |y whenever x < y, which implies that the marginal values of / are non-increasing. 

Let X < y, fix (5 > arbitrarily small, and pick x', y' such that | |x' — x| | < S, | |y' — y| | < x' < y' and on 
the line segment [x',y'], we have g^.g^. < except for a set of (1-dimensional) measure zero. If such a pair 
of points x',y' does not exist, it means that there are sets A,B of positive measure such that x G A, y G B 
and for any x' G A,y' G B, the line segment [x',y'] contains a subset of positive (1-dimensional) measure 
where g^.g^_ for some j is positive or undefined. This would imply that [0, l]'^ contains a subset of positive 
measure where g^.g^. for some j is positive or undefined, which we assume is not the case. 

Therefore, there is a pair of points x', y' as described above. We compare ^|x' and §§-\y' by integrating 
along the line segment [x',y']. Since ^ is absolutely continuous and gf-f|- = sx dx — ^ along this line 
segment for all j except for a set of measure zero, we obtain §§-\x' > ^ly'- This is true for x',y' arbitrarily 
close to X, y, and hence by continuity of we get ff-|x > ff"ly This implies that the marginal values of 
/ are non-increasing. □ 

The way we construct -F(x),G(x) is such that, given a large enough refinement of the ground set, it is 
impossible to distinguish the instances corresponding to -F(x) and G(x). As we argue more precisely later, 
this holds because under an unknown labeling of the ground set, all queries with high probability fall in the 
region where -F(x) = G(x). The following lemma gives the precise properties of -F(x) and G(x) that we need. 

Lemma 2. Consider a function f : 2^ — > R+ invariant under a group of permutations Q on the ground 
set X. Let -F(x) = E[/(x)], x — 'Eic^g[a{'K)\, and fix any e > 0. Then there is S > and functions 
F, G : [0, 1]'''^ — > K-i- (which are also symmetric with respect to Q ), satisfying: 

1. For all X G [0, 1]-^, G(x) = F{±). 

2. For all x G [0, 1]^, \F{^) ~ F(x)| < e. 

3. Whenever ||x — x|p < S, i^(x) — G(x) and the value depends only on x. 
4-. The first partial derivatives of F,G are absolutely continuous. 

5. If f is monotone, then > and > everywhere. 

6. If f is submodular, then ^ < and 9 < almost everywhere. 

•' ' OXiOXj — OXiOXj — ^ 

The proof of this lemma is the main technical part of this paper and we defer it to the end of this section. 
Assuming this lemma, we first finish the proof of the main theorem. We prove the following. 

3 A function F : [0, 1]^ ^ R is absolutely continuous, if Ve > 0; 35 > 0; Yll^i llxi - yiH < 5 => l^'i^i) - P'iViM < ^■ 
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Lemma 3. Let F,G be the two functions provided by Lemma\^ For a parameter n € Z_|- and N = [n], 
define two discrete functions f,g : 2^^^ — >■ as follows: Let cr'-*-' be an arbitrary permutation in Q for each 
i G N. For every set S' C iV x X , we define a vector ^{S) € [0, 1]'''' by 

n 

Let us define: 

f{S) = F{aS)), g{S) = G{aS)). 

In addition, let T = {S : ^(S) G P{F)} be a feasibility constraint such that the condition S G !F depends 
only on the symmetrized vector Is- Then deciding whether an instance given by value/membership oracles 
is niax{/(S') : S G F} or max{g(S') : S £ F} (even by a randomized algorithm, with any constant probability 
of success) requires an exponential number of queries. 

Proof. Let cr^'^ £ G he chosen independently at random for each i £ N and consider the instances max{/(S') : 
S € F}, max{g{S) : S € F} as described in the lemma. We show that every deterministic algorithm will 
follow the same computation path and return the same answer on both instances, with high probability. By 
Yao's principle, this means that every randomized algorithm returns the same answer for the two instances 
with high probability, for some particular tr'^*^ £ Q. 

The feasible sets in the refined instance, S £ F, are such that the respective vector ^(5) is in the polytope 
P{F). Since the instance is totally symmetric, the condition S £ F depends only on the symmetrized 
vector Ig. Hence, the condition £,{S) £ P{F) depends only on the symmetrized vector £,{S). Therefore, 
S £ F 4^ as) £ P{F) ^WS) & -P(-F). We have WS)^ = E,ee[i \{i £ N : {i,a^^{a{j))) £ S}\]. The 
distribution of (t'-*' o ct is the same as that of a, i.e. uniform over Q. Hence, S,{S)^ and consequently the 
condition S £ F does not depend on cr'*^ in any way. Intuitively, an algorithm cannot learn any information 
about the permutations cr^*^ by querying the feasibility oracle, since feasibility does not depend on cr^*). 

The main part of the proof is to show that even queries to the objective function are unlikely to reveal any 
information about the permutations The key observation is that for any fixed query Q to the objective 
function, the associated vector q = ^(Q) is very likely to be close to its symmetrized version q. To see this, 
consider a query Q. The associated vector q = S,{Q) is determined by 

q,^\\{^&N■.{^,a^^{J))£Q}\ = ^J2Q^^ 

i=l 

where Qij is an indicator variable of the event (i, £ Q. This is a random event due to the randomness 

in cr'-*-' . We have 



E[Q,,] = Pr[0., = 1] = Pr [(^,aW(J)) G Q]. 



Adding up these expectations over i £ N, we get 

^E[Q„-] = ^ Pr £Q]= E,eG[\{^ e N : £ Q}\]. 

ieN ieN" 

For the purposes of expectation, the independence of cr^^\ . . . , cr^"-' is irrelevant and that is why we replace 
them by one random permutation a. On the other hand, consider the symmetrized vector q, with coordinates 

q, = E<,ee[g,(,)] = -E,eg[|{i £ N : (i,a«(a(j))) £ Q}\] = -E,eg[|{i £ N : £ Q}\] 

' n n 

using the fact that the distribution of cr'-*' o c is the same as the distribution of cr - uniformly random over 
Q. Note that the vector q depends on the random permutations (t'-*' but the symmetrized vector q does not; 
this will be also useful in the following. For now, we summarize that 



* n 

i=l 



1 " 
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Since each permutation cr(*) is chosen independently, the random variables {Qij : 1 < i < n} are independent 
(for a fixed j). We can apply ChernofF's bound (see e.g. [T], Corollary A. 1.7): 



> a 



< 2e 



Using qj = i ^27=1 Qij^ 1j = 7i I]"=i ^IQij] and setting a = ny/6/\X\, we obtain 



Pr 



\q,~q,\ > V5/\X\ 



By the union bound, 



Prf 



>d]< 



Pr[|g,-g,f ><5/|X|]<2|X|e-2"^/l^l 



Note that while 6 and \X\ are constants, n grows as the size of the refinement and hence the probability is 
exponentially small in the size of the ground set N x X. 

Define D{q) = ||q — q|p. As long as D{q) < 6 for every query issued by the algorithm, the answers 
do not depend on the randomness of the input. This is because then the values of -F(q) and G'(q) depend 
only on q, which is independent of the random permutations cr^*'', as we argued above. Therefore, assuming 
that -D(q) < S for each query, the algorithm will always follow the same path of computation and issue the 
same sequence of queries S. (Note that this is just a fixed sequence which can be written down before we 
started running the algorithm.) Assume that is subexponential in n. It happens with high probability 
that D{q) < S for all Q £ S. Then, the algorithm indeed follows this path of computation and gives the 
same answer. In particular, the answer does not depend on whether the instance is max{/(S') : S G or 
max{5(S') : S e f}. □ 

Proof of Theorem\3i Fix an e > 0. Given an instance max{/(S') : S G J-} totally symmetric under C/, let 
F, G : [0, l]'^ — R be the two functions provided by Lemma [2] We choose a large number n and consider 
a refinement J- on the ground set N x X, where N = [n]. We define discrete instances of submodular 
maximization max{/(5) : S G J-} and max{g(S') : S G J-}. As in Lemma [3l for each i £ N we choose a 
random permutation cr''-* G G. This can be viewed as a random shuffle of the labeling of the ground set before 
we present it to an algorithm. For every set 5 C x X, we define a vector f (5) G [0, l]'^ by 

^,iS) = - {^eN■.i^,a(^\J))£S} . 
n 

In other words, £,j{S) measures the fraction of copies of element j contained in S; however, for each i the 
i-copies of all elements are shuffled by cr(*). Next, we define 

fiS) = FiaS)), g{S) = GiaS)). 

We claim that / and g are submodular (for any fixed ^). Note that the effect of cr^'^ is just a renaming (or 
shuffling) of the elements of A^ x A, and hence for the purpose of proving submodularity we can assume that 
o-(*) is the identity for all i. Then, ^j{S) = ^\S D {N x {j})\. Due to LemmaH the property < 

(almost everywhere) implies that / is submodular. In addition, if the original instance was monotone, then 
4^ > and f is monotone. The same holds for q. 

OXj — 

The value of g{S) for any feasible solution S" G J" is bounded by g{S) = G{^{S)) = F(|(S)) < OPT+e. On 
the other hand, let x* denote a point where the optimum of the continuous problem max{F(x) : x G P(J^)} 
is attained, i.e. F{x*) > OPT — e. For a large enough n, we can approximate the point x* arbitrarily closely 
by a rational vector with n in the denominator, which corresponds to a discrete solution S* £ T whose value 
f{S*) is at least, say, OPT — 2e. Hence, the ratio between the optima of the discrete optimization problems 
max{/(5') : S £ J^} and max{g(S') : S £ ^} can be made at most -§pfz^j i-^- arbitrarily close to the 
symmetry gap 7 = -^j^. 

By Lemma [HI distinguishing the two instances max{/(S') : S £ J-} and max{g(5) : S £ J-}, even by a 
randomized algorithm, requires an exponential number of value queries. Therefore, we cannot estimate the 
optimum within a factor better than 7. □ 
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Next, we prove Theorem U] (again assuming Lemma [5]), i.e. an analogous hardness result for solving the 
multihnear optimization problem. 

Proof of Theorem^ Given a symmetric instance max{/(S') : S € J^} and e > 0, we construct refined and 
modified instances max{/(S') : S £ J-}, max{g(S') : S € J^}, derived from the continuous functions F,G 
provided by Lemma [5J exactly as we did in the proof of Theorem [3] Lemma [3] states that these two instances 
cannot be distinguished using a subexponential number of value queries. Furthermore, the gap between the 
two modified instances corresponds to the symmetry gap 7 of the original instance: max{/(S') : S £ J-} > 
OPT - 2e and max{g{S) : S e f} < TJPT + e = -fOPT + e. 

Now we consider the multilinear relaxations of the two refined instances, max{F(x) : x S P(J^)} and 
max{G'(x) : x G P{F)}. Note that F, G, (although related to G) are not exactly the same as the functions 
-F, G; in particular, they are defined on a larger (refined) domain. However, we show that the gap between 
the optima of the two instances remains the same. 

First, the value of max{F(x) : x e P(J^)} is at least the optimum of the discrete problem, max{/(S') : 
S G F}, which is at least OPT — 2e as in the proof of Theorem |31 The value of max{G(x) : x £ P(J^)} 
can be analyzed as follows. For any fractional solution x g P{T), the value of G(x) is the expectation 
E[g(x)], where x is obtained by independently rounding the coordinates of x to {0,1}. Recall that g is 
obtained by discretizing the continuous function G (using Lemma [ij. In particular, 'g{S) = G(x) where 
Xi = ■^\S O {N X {i})\ is the fraction of the respective cluster contained in S, and |A^| =n\s the size of each 
cluster (the refinement parameter). If I5 = x, i.e. S is chosen by independent sampling with probabilities 
according to x, then for large n the fractions ^jS" n [N x will be strongly concentrated around their 
expectation. As G is continuous, we get lim„^oo E[g(x)] — lim„_j.oo E[G(x)] ~ G(E[x]) = G(x). Here, x is 
the vector x projected back to the original ground set X , where the coordinates of each cluster have been 
averaged. By construction of the refinement, if x G P{F) then x is in the polytope corresponding to the 
original instance, P{J-). Therefore, G(x) < max{G(x) : x e P{F)} < jOPT + e. For large enough n, this 
means that max{G(x) : x e P{j^)} < jOPT + 2e. This holds for an arbitrarily small fixed e > 0, and 
hence the gap between the instances max{F(x) : x e P{F)} and max{G(x) : x e P{F)} (which cannot be 
distinguished) can be made arbitrarily close to 7. □ 

Hardness for symmetric instances. We remark that since Lemma [2] provides functions F and G sym- 
metric under Q, the refined instances that we define are invariant with respect to the following symmetries: 
permute the copies of each element in an arbitrary way, and permute the classes of copies according to any 
permutation a £ Q. This means that our hardness results also hold for instances satisfying such symmetry 
properties. 

It remains to prove Lemma [H Before we move to the final construction of -F(x) and G(x), we construct 
as an intermediate step a function -F(x) which is helpful in the analysis. 

Construction. Let us construct a function F(x) which satisfies the following: 

• For X "sufficiently close" to x, F{x.) = G(x). 

• For X "sufRciently far away" from x, F(x) ~ F{x). 

• The function -F(x) is "approximately" smooth submodular. 

Once we have F(x), we can fix it to obtain a smooth submodular function F{xj, which is still close to the 
original function F(x). We also fix G(x) in the same way, to obtain a function G(x) which is equal to F{k) 
whenever x is close to x. We defer this step until the end. 

We define F{x.) as a convex linear combination of F{x.) and G(x), guided by a "smooth transition" 
function, depending on the distance of x from x. The form that we use is the foUowingQ 

H^) - (1 - ^(i?(x)))F(x) + <^(i?(x))G(x) 

*We remark that a construction analogous to I19| would be -F(x) = -F(x) — 0(//(x)) where -ff(x) = -F(x) — G(x). While this 
makes the analysis easier in 1191 . it cannot be used in general. Roughly speaking, the problem is that in general the partial 
derivatives of H(x.) are not bounded in any way by the value of -ff(x). 
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where (j) : R+ — > [0, 1] is a suitable smooth lunction, and 

D{-x.) = ||x - x|p = ^{x, - x^f. 

i 

The idea is that when x is close to x, 0(D(x)) should be close to 1, i.e. the convex linear combination should 
give most of the weight to G(x). The weight should shift gradually to i^(x) as x gets further away from x. 
Therefore, we define = 1 in a small interval t e [0, 5], and (^(t) tends to as i increases. This guarantees 
that F(x) = G'(x) whenever £'(x) = ||x — x|p < 6. We defer the precise construction of (\){t) to Lemma [71 
after we determine what properties we need from Note that regardless of the definition of -F(x) is 
symmetric with respect to t/, since _F(x),G(x) and £'(x) are. 



Analysis of the construction. Due to the construction of i^(x), it is clear that when I?(x) ~ ||x — x|p 
is small, F(x) = G(x). When -D(x) is large, F(x) ~ -P'(x). The main issue, however, is whether we can say 
something about the first and second partial derivatives of F . This is crucial for the properties of monotonicity 
and submodularity, which we would like to preserve. Let us write -F(x) as 

F{^) = Fi^) ~ 0(i^(x))i/(x) 

where H{x) = F(x) — G(x). By differentiating once, we get 



dF_ ^ aff 

dxi dxi dxi 



0'p(x))gi/(x) 



and by differentiating twice, 
d^F 



dxidxi 



<^(-C'(x))— — (j) {D{:>c)) — —H{x.) 



dxidxj 
-<t>'{D{x)) 



dxi dxj 
dD dH d^D 



dxj dxi 



dxi dx 



-i/(x) 



dxi dxj 
dD OH 

dxj dx-j 



(1) 



(2) 



The first two terms on the right-hand sides of ([T]) and ([2]) are not bothering us, because they form convex 
linear combinations of the derivatives of F{x.) and G(x), which have the properties that we need. The 
remaining terms might cause problems, however, and we need to estimate them. 

Our strategy is to define (/>(t) in such a way that it eliminates the influence of partial derivatives of D 
and H where they become too large. Roughly speaking, D and H have negligible partial derivatives when x 
is very close to x. As x moves away from x, the partial derivatives grow but then the behavior of must 
be such that their influence is supressed. 

We start with the following important claimlf] 



Lemma 4. Assume that F : [0, 1]"^ 
G(x) = F{5i), where x = ECTeg[cr(x)]. 



-> M is invariant under a group of permutations of coordinates Q. Let 
Then for any x € [0, 1]'''", 

VGL- VFU. 



Proof. To avoid confusion, we use x for the arguments of the functions F and G, and u, u, etc. for points 
where their partial derivatives are evaluated. To rephrase, we want to prove that for any point u and any 



coordinate i, the partial derivatives of F and G evaluated at u are equal: 

First, consider -F(x). We assume that F{n) is invariant under a group of permutations of coordinates Q, 
F(fT(x)) for any a ^ Q. Differentiating both sides at x = u, we get by the chain rule: 

dx„ 



OF 



i.e. Ffx) 



dF 

dxi 



E 



dF 
dxi 



_d_ 

x=(t(u) dXt 



(^(x)), 



OF 



^ dxj 



x=(t(u) dXi 



We remind the reader that V-F, the gradient of _F, is a vector whose coordinates arc the first partial derivatives ^j-. 
denote by V-F|x the gradient evaluated at x. 



We 
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Here, 



dx. 



7^ = 1 if a{j) = i, and otherwise. Therefore, 

dF OF 

dx, 



dx„-i 



X— a"(u) 



Now, if we evaluate the left-hand side at u, the right-hand side is evaluated at cr(u) — u, and hence for any 
i and any cr € C/, 

dF 



dF 
dxi 



dx 



(3) 



Turning to G(x) = -F'(x), let us write 4^ using the chain rule 



dG 
dxi 

We have Xj — Ecrgg[xo.(j)], and so 

dF 



dxi ^ ^ 



V — 



=u dxi 



dG 

dxi 



E 



d 



E 



Again, 



x=u 

1 if (t(j) = i and otherwise. Consequently, we obtain 



dF 
dxi 



dx 



dxi 



dG 

dxi 



' dF 




dF 


_dx^-i(^ 


X — U 


dxt 



where we used Eq. ([3]) to remove the dependence on a £ G- 



□ 



Observe that the symmetrization operation x is idempotent, i.e. x = x. Because of this, we also get 
VG|x = Vi^lx. Note that G(x) — -F'(x) follows from the definition, but it is not obvious that the same holds 
for gradients, since their definition involves points where G(x) 7^ F{x). For second partial derivatives, the 
equality no longer holds, as can be seen from a simple example such as F(a;i,a;2) = 1 — (1 — a;i)(l — X2), 

G(xi,X2) = l-(l-2i±£2)2_ 

Next, we show that the functions F{x.) and G(x) are very similar in the close vicinity of the region where 
X = X. Recall our definitions: H{x.) = F{x.) — G(x), D(x) = ||x — x|p. Based on LemmalU we know that 
-ff(x) = and Vi/|x = 0. In the following lemmas, we present bounds on i?(x), D{x) and their partial 
derivatives. 



Lemma 5. Let f : 2 



X 



[0,M] be invariant under a permutation group Q. Let x — Eg-gg [cr(x)], I?(x) 



|x - x||2 and iJ(x) = i^(x) - G(x) where F(x) = E[/(x)] and G(x) = F(x). Then 



I g^.Qj. I < 8A/ everywhere, for all i,j; 



2. ||ViI(x)|| <8M|X|/D(x); 

3. |i7(x)| < 8M\X\ ■ L>(x). 

Proof. First, let us get a bound on the second partial derivatives. Assuming without loss of generality 

Xi 



Xj =0, we havq 



d^F 

dxidx^ 



= E[/(x V (e, + e,)) - /(x V e,) - /(x V e,) + /(x)] 



(see [17]). Consequently, 



d'^F 



dxidxj 



< 4 max 1/(5') I = AM. 



®x V y denotes the coordinate-wise maximum, (x V y)i = Taax{xi,yi} and x A y denotes the coordinate-wise minimum, 
(x Ay)i = mm{xi,yi}. 
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It is a little bit more involved to analyze q'^.q^. ■ Since G(x) = -F(x) and x = E(^gg[cr(x)], we get by the 
chain rule: 



d^G _ d'^F dxk dxi _ 



dxidx-i ^ dxkdxp dxi dx-j 



E 



dxkdxt dxi dxj 



It is useful here to use the Kronecker symbol, 5ij, which is 1 if i = j and otherwise. Note that 
5i,(7(k) = <^<T-i(i),fe) etc. Using this notation, we get 



dx„ 



dxi 



dxidxj 



E 

k,i 

d^G 



dxidxj 



d^F 
dxkdxe 



S(T-^ii),k^a-^j)y 

d'^F 



d'^F 



dXa-i{i)dx^-i(^j) 



dx^-^i)dx^-i(^j) 



< 4M 



and therefore 







d^F 


d^G 


dxidxj 




dxidxj 


dxidxj 



< 8M. 



Next, we estimate at a given point u, depending on its distance from u. Consider the line segment 
between u and u. The function H{x) = F(x) — G'(x) is Coo-differentiablc, and hence we can apply the mean 
value theorem to There exists a point u on the line segment [u, u] such that 



dH 
dxi 



dH 
dxi 



E 



dxjdxi 



_{Uj-Uj). 



Recall that #^ 



f dH 
\dxi 



0. Applying the Cauchy-Schwartz inequality to the right-hand side, we get 



< 



E 



dxjdxi 



\u-u\\' < (8M)^|X|||u-u|r. 



Adding up over alH € X, we obtain 



< (8M|X|)2||u-u||2. 



Finally, we estimate the growth of H{u). Again, by the mean value theorem, there is a point u on the line 
segment [u, u], such that 

iJ(u) - H{u) = (u - u) • ViJ(u). 
Using -ff (u) = 0, the Cauchy-Schwartz inequality and the above bound on VH, 

{H{u)f < \\WH{u)\f\\u -uf< (8M|X|)2||u - u||2||u - u\f. 

Clearly, ||u — u|| < ||u — u||, and therefore 

|iJ(u)| < 8M|X| • ||u- u||2. 



□ 



Lemma 6. For the function D{x.) 



|x — xlP, we have 



1. VD = 2(x - x), and therefore \\VD\\ = 2^jD{yi). 
^- For alii, 3, Igf^l < 2. 
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Proof. Let us write -D(x) as 

D(x) = ^{xi - Xif = ^E<^ge[xi - a;„(i)]E^ge[xi - a;^(i)]. 

Taking the first partial derivative, 

— = 2 2_^ECTgg[xi - Xct(j)J— E^gg[a;i - a;^(j)J. 

As before, we have — Sij. Using this notation, we get 

QJJ ^ 

^ i 

= 2^Eo-,ree[(a;i - a;cr(i))(% - (5^^,— iq-))] 

= 2 Eo-,ree[a;j - Xo-q) - Xr-i(j) + ^^^(t— 
Since the distributions of ct(j), and a{T^^{j)) are the same, we obtain 



dD , ^ , 

— = 2 E^eg[xj - x^Q-)] = 2(a;j 



dxj 

and 

Finally, the second partial derivatives are 



d'^D d _ d 



dxidxj dxi dx 

which is clearly bounded by 2 in the absolute value. □ 

Now we come back to F(x) and its partial derivatives. Recall equations ^ and ([2|). The problematic 
terms are those involving 0'(£'(x)) and <j)"{D{-x)). Using our bounds on H{x.), -D(x) and their derivatives, 
however, we notice that 0'(i)(x)) always appears with factors on the order of D{k) and (j)"{D{x)) appears 
with factors on the order of (i?(x))^. Thus, it is sufficient if is defined so that we have control over t(j)'{t) 
and t^4)"{t). The following lemma describes the function that we need. 

Lemma 7. For any a, /3 > 0, there is 5 ^ (0, /3) and a function (p : M_|_ — > [0, 1] with an absolutely continuous 
first derivative such that 

1. Fort< S, (l){t) ^ 1. 

2. Fort> P, (f){t) < e-i/". 

3. For all t > 0, \t(f)'{t)\ < 4a. 

4. For almost all t > 0, \t'^4'"it)\ < 10a. 

Proof. First, observe that the quantities t(t)'{t) and t^4)"{t) are invariant under scaling of t. Therefore, we 
can assume without loss of generality that /3 > is a value of our choice, for example /3 — e^/^^" ■* + 1. If we 
want to prove the result for a different value of /3, we can just scale the argument t and the constant 5 by 
^/(gi/(2a2) _^ ^Ylg bounds on tcj)'{t) and t^<i)"{t) stiU hold. 

We can assume that a G (0, i) because for larger a, the statement only gets weaker. As we argued, we 

can assume WLOG that /3 = e^/^^""") + 1. We set (5 = 1 and ^2 = 1 + (1 + a)"^/^ < 2. (We remind the reader 
that in general these values will be scaled depending on the actual value of /?.) We define the function as 
follows: 
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1. (t>{t) = 1 for t e [0,5]. 

2. (t){t) ^l~a{t- If for t e [(5,(52]. 

3. (/>(t) = (1 + Q:)-i-"(t- for i e [(52, oo). 

Let's verify the properties of (j}{t). For t e [0, 6], we have (/)'(i) = (t)"{t) = 0. For t e [(5, ^2], we have 

(/)'(t) = -2a(t- 1), (j)"{t) = -2a, 

and for i g [(52, c»), 

(^'(t) - -2a(l + (t - l)-2a-i ^ 

(^"(i) = 2a(l + 2a)(l + a)-^-'^ {t - i)-2"-2 

First, we check that the vakies and first derivatives agree at the breakpoints. For t = 6 ^ 1, we get (f>{l) = 1 
and = 0. For t = (52 = 1 + (1 + we get (j){S2) = (1 + a)"^ and (j)'{d2) = -2a{l + a)~^^^. Next, 

we need to check is that (j>{t) is very small for t > f3. The function is decreasing for t > /3, therefore it is 
enough to check i = /3 = e^/^^" ^ + 1: 

(^ (/3) = (1 + ay^-'^ip - < _ i)-2" ^ e-i/". 

The derivative bounds are satisfied trivially for t e [0, S]. For t e [(5, (52], using t < S2 = 1 + {I + a)'^/"^, 

\t(j)'{t)\ = t ■ 2a(t - 1) < 2a(l + (1 + + a)"^/^ < 4^, 

and 

|i20"(t)| = f2 . 2^ < 2a(l + (1 + a)-i/2)2 < 
For t G [(52, 00), using < - 1 > (1 + a)-^^^, 

< 2a t ^ 2. l + (l + «)-^/^ _2a ^ + (^ + ")"^' - la 
- 1 + a t-l-l + a (l + a)-i/2 (1 + a)i/2 - 



and finally, 
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2a 1 + 2(3) /I + 1 + ct -1/2^, „ , „ X 
< ^ ^M^r X 1/0 <8al + 2a <10a. 



Using the bounds from Lemmas [SJ |6] and [71 we can prove bounds on the derivatives of i^(x). 



□ 



Lemma 8. Let F{x) = (1 - (l){D{x.)))F{:>c) + (/>(i:>(x))G(x) where F{x) = E[/(x)], f : 2^ ^ [0,M], 
G'(x) = i^(x), L>(x) = ||x - x||2 are as above and (j){t) is provided by Lemma^for a given a > 0. Then, 



whenever g§-£- < 0, 



< 512M|X|a. 



dxidxj 



If, in addition, > 0, then 

dF 

^ > -MM\X\a. 
dxi 
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Proof. We have -F(x) = -F(x) — (f){D{x))H{x) where -ff(x) = F{x.) — G(x). By differentiating once, we get 



I.e. 



dF 

dxi 


dF 
dxi 


r) FT 


0'p(x))giJ(x), 


dP 
dxi 


/dF 
\dxi 




= 0'p(x))|£7?(x) 



By Lemma [5] and H we have ||^| = 2\xi - < 2 and |i7(x)| < 8M\X\ ■ D{x.). Therefore, 



< 16M|X|i:i(x) • <j)'{D{x)) 



By Lemma [3 \D{x)<j)' {D{yL))\ < 4q, and hence 



dF / dF dH 

dxi \ dxi dxi 



< 64:M\X\a. 



Assuming that ^ > 0, we also have > (see Lemma [J]) and therefore, ^ — (/)(D(x))^|^ = (1 
|^ + 0(D(x))^ 



0p(x)))|^l + 0(i?(x))^ > 0. Consequently, 



dF 

dxi 



> -6AM\X\a. 



By differentiating F twice, we obtain 

d^F d^F 



dxidxj 



-^(^(x))/^-0"P(x))|^|^i?(x) 



dxidxj dxidxj dxi dxj 

-<t>'{D{x)) + ^!^i/(x) + 

\dxj dxi dxidxj dxi dxj 



Again, we use Lemma [S] and [S] to bound |i?(x)| < 8M\X\D(:>c), |||-| < 8M\X\y/D{K}, |gf^| < 8M, 
\§§-\ < and |^| < 2. We get 



d^F 



dxidxj dxidxj 



0(^(x)) 



dxi dxj 



< 32M\X\ 



D^{x)(f>"{D{x)) +48M|X| D{x)(l)'{D{x)) 



Observe that 0'(i:)(x)) appears with i:)(x) and 0"(i:>(x)) appears with (L»(x))2. ByLemma[2l |D(x)(?!)'(L»(x))| < 
4a and |D2(x)0"(i:)(x))| < 10a. Therefore, 



d^F 



dxidxj 



d^F 
dxidxj 



d^H 

dxidxj 



< 320M|X|a+ 192M|X|a = 512M\X\a. 



If 



dxidx 



- < 0, then also ^ < (see the proof of LemmaE]) and ^-0(i5(x))g|^ = 0(Z?(x))g^ 



(l-'^PW))alli|-<0 



We obtain 



d^F 
dxidx-i 



< 5UM\X\a. 



□ 



Finally, we can finish the proof of Lemma [H 
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Proof of Lemma[E Let e > and / : 2^ ^ [0, M]. We choose /? = jg^gp^ , so that \H{x)\ = |F(x) - G(x)| < 
e/2 whenever D{x) = ||x - x||2 < /3 (due to by Lemma[5l which states that \H{x)\ < 8M\X\D{x)). Also, 
let a = 2oooM|xp • these values of > 0, let S > and </) : Rj^ — > [0, 1] be provided by Lemma [T] We 
define 

Fix) = (1 - (b{D{x)))F{x) + (b{D{x))G{x). 

Lemma [8] provides bounds on the first and second partial derivatives of F{x). Finally, we modify i^(x) so 
that it satisfies the required conditions (submodularity and optionally monotonicity). For that purpose, we 
add a suitable multiple of the following function: 

J(x) - + 3\X\ 5] - [ ^ X, 

We have < J(x) < 3|X|2, ^ = 3\X\ - 2J2iex^^ > \^\- Further, = -2. Note also that 

J(x) = "/(x), since J(x) depends only on the sum of all coordinates X^iGX-^*- "^o make F{x) submodular 
and optionally monotone, we define: 

F(x) = Fix) + 256M|X|aJ(x), 

G(x) = G(x) + 256M\X\aJ{x). 
We verify the properties of F{x) and G(x): 

1. For any x G P{J-), we have 

G(x) = G(x) +256M|X|aJ(x) 
= +256M|X|aJ(x) 
= ^(x). 

2. When D{x) = ||x — x|p > /3, Lemma [7] guarantees that < (j){D{x)) < e^^/" < a and 

|F(x) - F(x)| < 0(i:>(x))|G(x) - F(x)| + 256M\X\aJ{x) 

< aM + 768M\X\^a 

< e 

using < F(x),G(x) < M, < J(x) < 3\X\^ and a = 2000IW 

When £'(x) = ||x— x|p < /3, we chose the value of (3 so that |G(x) — F(x)| < e/2 and so by the above, 

|F(x)-F(x)| < (l){D{x))\G{x) - F{x)\+256M\X\aJ{x) 

< e/2 + 768Af|Xpa 

< e. 

3. Due to Lemma[71 (pit) = 1 for t E [0,6]. Hence, whenever D{x) = \\x — x|p < S, we have F{x) = 
G(x) = F{x), which depends only on x. Also, we have F{x) — G(x) = F{x) + 256M\X\aJ{x) and 
again, J(x) depends only on x (in fact, only on the average of all coordinates of x). Therefore, F{x) 
and G(x) in this case depend only on x. 

4. The first partial derivatives of F are given by the formula 

- ^'iDix))^Hix) + 256M\X\a^. 

The functions F, H, D, J are infinitely differentiable, so the only possible issue is with </>. By inspecting 
our construction of (p (Lemma [7]), we can see that it is piecewise infinitely differentiable, and (p' is 
continuous at the breakpoints. Therefore, it is also absolutely continuous. This implies that ^ is 
absolutely continuous. 

The function G(x) = F{x) + 256M|X|aJ(x) is infinitely differentiable, so its first partial derivatives 
are also absolutely continuous. 
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5. Assuming ^ > 0, we get §§- > -64M\X\a by Lemma El Using ^ > |X|, we get ^ = ^ + 
256M\X\aJ{x) > 0. The same holds for §§- since §§->0. 

6. Assuming ^ < 0, we get ^ < 512M\X\a by Lemma 1 Using ^ = -2, we get ^ = 
j^-i- + 2bQM\X\aj^^ < 0. The same holds for ^j^^ since < 0. 

□ 

This concludes the proofs of our main hardness results. 



4 Algorithms using the multihnear relaxation 

Here we turn to our algorithmic results. First, we discuss the problem of maximizing a submodular (but not 
necessarily monotone) function subject to a matroid independence constraint. 

4.1 Matroid independence constraint 

Consider the problem max{/(S') : S E I}, where I is the collection of independent sets in a matroid A4. 
We design an algorithm based on the multilinear relaxation of the problem, max{i^(x) : x g P{A4)}. Our 
algorithm can be seen as "continuous local search" in the matroid polytope P{Ai), constrained in addition 
by the box [0,t]"^ for some fixed t S [0, 1]. The intuition is that this forces our local search to use fractional 
solutions that are more fuzzy than integral solutions and therefore less likely to get stuck in a local optimum. 
On the other hand, restraining the search space too much would not give us much freedom in searching for 
a good fractional point. This leads to a tradeoff and an optimal choice of t € [0, 1] which we leave for later. 

The matroid polytope is defined as P{M) — convjl/ : / e I}, or equivalently [7] as P{M) = {x > : 
^■^S J2ies — ^m{S)}, where rj^iS) is the rank function of M. We define 

Pt{M) = P{M) n [0,tf = {x e P{M) : Vi; x, < t}. 

We consider the problem max{i^(x) : x e Pt{A4)}. We remind the reader that -F'(x) = E[/(x)] denotes the 
multilinear extension. Our algorithm works as follows. 

Fractional local search in Pt{Ai) 
(given t — ^, r < q integer) 

1. Start with x := (0,0, ...,0). Fix S = 1/q. 

2. If there is i,jEX and a direction v e {sj, — e^, — e^} such that x+(5v £ Pt{Ai) and F{'k+Sv) > -F(x), 
set X := X + (5v and repeat. 

3. If there is no such direction v, apply pipage rounding to x and return the resulting solution. 

Notes. The procedure as presented here would not run in polynomial time. A modification which runs in 
polynomial time is that we move to a new solution only if i^(x + dv) > F(x) + po;^(„) OPT (where we first 
get a rough estimate of OPT using previous methods). For simplicity, we analyze the variant above and 
finally discuss why we can modify it without losing too much in the approximation factor. We also defer the 
question of how to estimate the value of F(x) to the end of this section. 

For i = 1, we have 6=1 and the procedure reduces to discrete local search. However, it is known that 
discrete local search alone does not give any approximation guarantee. With additional modifications, an 
algorithm based on discrete local search achieves a (| — o(l))-approximation [16] . 

Our version of fractional local search avoids this issue and leads directly to a good fractional solution. 
Throughout the algorithm, we maintain x as a linear combination of q independent sets such that no element 
appears in more than r of them. A local step corresponds to an add/remove/switch operation preserving 
this condition. 

Finally, we use pipage rounding to convert a fractional solution into an integral one. As we show in 
Lemma [22l a modification of the technique from [3] can be used to find an integral solution without any loss 
in the objective function. 
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Theorem 5. The fractional local search algorithm for any fixed te [0,i(3-\/5)] returns a solution of value 
at least {t - \t^)OPT, where OPT = max{/(S') : 5' G X}. 

We remark that for t = ^(3 — VS), we would obtain a |(— 1 + -s/S) = 0.309-approximation, improving the 
factor of -J [16]. This is not a rational value, but we can pick a rational t arbitrarily close to ^(3 — a/5)- For 
values t> — a/S), our analysis does not yield a better approximation factor. 

First, we discuss properties of the point found by the fractional local search algorithm. 

Lemma 9. The outcome of the fractional local search algorithm x is a "fractional local optimum" in the 
following sense. (All the partial derivatives are evaluated at x.) 



For any i such that x — (5ej G Pt{M), W- > 



For any j such that x + Sej G Pt{M), < 



0. 



• For any i,j such that x + S{ej — e^) G Pt{M), ^ — ^ < 0. 

Proof. We use the property (see [3]) that along any direction v = ie^ or v = e.; — ej, the function F(x + Av) 
is a convex function of A. Also, observe that if it is possible to move from x in the direction of v by any 
nonzero amount, then it is possible to move by (5v, because all coordinates of x are integer multiples of 5 
and all the constraints also have coefficients which are integer multiples of 5. Therefore, if ^ > and it is 
possible to move in the direction of v, we would get i^(x + 5-v) > F(x) and the fractional local search would 
continue. 

If V = — ei and it is possible to move along — e^, we get ^ = ~'§^ — ^- Similarly, if v = e^ and it is 
possible to move along e^, we get ^ — < 0. Finally, if v = e^ — and it is possible to move along 
e,-e„wegetf = f:~|^<0. □ 

In the following, we refer to the following exchange property for matroids (which follows easily from |25] . 
Corollary 39.12a; see also |16)V 

Lemma 10. If I,C G I, then for any j E C\I, there is Tr{j) C / \ C, |7r(j)| < 1, such that I \ Tr{j) + j G I. 
Moreover, the sets 7r(j) are disjoint (each i E I\C appears at most once as 7r(j) — {i}). 

Using this, we prove a lemma about fractional local optima which generalizes Lemma 2.2 in |16] . 

Lemma 11. Let x be the outcome of fractional local search over Pt{A4). Let C E T be any independent set. 
Let C = {i e C : Xi < t}. Then 

2i^(x) > F(x V Ic") + F{x A Ic). 

Note that for t — 1, the lemma reduces to 2i^(x) > F(xV Ic) + F{kA Ic) (similar to Lemma 2.2 in [T5]). 
For t < I, however, it is necessary to replace C by C" in the first expression, which becomes apparent in the 
proof. The reason is that we do not have any information on for coordinates where Xi = t. 

Proof. Let C G I and assume x G Pt{A4) is a local optimum. We can decompose it into a convex linear 
combination of vertices of P{M), x = J2iei ^^^^ where = 1. By the smooth submodularity of F{x) 

(see [27]), 

JSC ' jeC ^ I 3£C'\I ^ 

All partial derivatives here are evaluated at x. On the other hand, also by submodularity, 

i(fc i(^c i:iei I iei\c 

To prove the lemma, it remains to prove the following. 
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Claim. Whenever xi > 0, Ejec'\/ §§; < T,^e 



I\C dxi ■ 

Proof: For any / e Z, we can apply Lemma [TOl to get a mapping tt such that /\7r(j) +j G I for any j G C\/. 
Now, consider j e C" \ /, i.e. j G C \ / and < t. 

If 7r(j) = 0, is possible to move from x in the direction of Gj, because I + j G I and hence we can replace 
/ by / + j (or at least we can do this for some nonzero fraction of its coefficient) in the linear combination. 
Because Xj < t, we can move by a nonzero amount inside Pt(A^). By Lemma [51 < 0. 

Similarly, if 7r(j) — {i}, it is possible to move in the direction of — e^, because / can be replaced by 
/ \ 7r(j) + i for some nonzero fraction of its coefficient. By Lemma |9l in this case — ^ < 0. 

Finally, for any i d I we have Xi > and therefore we can decrease Xi while staying inside Pt{Ai). By 
Lemma [HI we have 1^ > for all i G /. This means 

y-^ dF ^ dF v-^ OF y-^ dF 
> < > h > < > 

dxi ^ dxj ^ dxi ^ dxi 

]ec'\i ' jec'\/:7r(j)=0 ^ jeC'\i:T^(j)={i} iei\c 

using the inequalities we derived above, and the fact that each i E I\C appears at most once in 7r(j). This 
proves the Claim, and hence the Lemma. □ 

Now we are ready to prove Theorem [5l 

Proof. Let x be the outcome of the fractional local search over Pt{A4). Define A = {i : Xi = t}. Let C be 
the optimum solution and C' = C\ A = {iGC : Xi < t}. By Lemma [TTl 

2F{k) > F{x V Ic) + F{x A Ic). 

First, let's analyze F{xA Ic)- We apply Lemma [T51 which states that F(xA Ip) > E[/(r(xA Ic))]- Here, 
T(x A Ic) is a random threshold set corresponding to the vector x A Ic, i.e. 

T(x A Ic) = {i : (x A lc)^ > A} = {i G C : > A} = r(x) n C. 

Therefore, 

F(xAlc)>E[/(r(x)nC)]. 

Due to the definition of a threshold set, with probability t we have X < t and T(x) contains A ~ {i : xi ~ 
t} = C\C'. Then, /(T(x) n C) + /(C) > f{C) by submodularity. We conclude that 

F(xAlc)>t(/(C)-/(C')). (4) 

Next, let's analyze i^(x V Ic')- We consider the ground set partitioned into X = C U (7, and we apply 
Lemma [TOl We get 

^'(x V Ic) > E[/((ri(x V Ic) n c) u (T2(x V Ic) n c))]. 

The random threshold sets look as follows: ri(xV Ic) C\C ~ (T'i(x) U C) n C is equal to C with probability 
t, and equal to C" otherwise. T2(x V Ic) H (7 = 72(x) n C is empty with probability 1 — t. (We ignore the 
contribution when T2(x) n C 7^ 0.) Because Ti and T2 are independently sampled, we get 

i^(x V Ic) > t{l - t)f{C) + (1 - t)^f{C'). 

Provided that i G [0, i(3 - V5)], we have t<{l-t)^. Then, we can write 

F{^Vlc')>t{l--t)f{C)+tf{C'). (5) 

Combining equations dH) and dU, we get 

F(x V Ic) + ^(x A Ic) > t{f{C) ~ f{C')) + t{l - t)f{C) +tf{C') = {2t - t^)f{C). 

Therefore, 

F(x) > l(^'(x V Ic) + i^(x A Ic)) >{t- ii2)/(c). 

Finally, we apply the pipage rounding technique which does not lose anything in terms of objective value 
(see Lemma [22I) . □ 
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Technical remarks. In each step of the algorithm, we need to estimate values of -F'(x) for given x G Pt{Ai). 
We accomplish this by using the expression -F(x) = E[/(i?(x))] where i?(x) is a random set associated with 
X. By standard bounds, if the values of f{S) are in a range [0, M], we can achieve accuracy M/poly{n) using 
a polynomial number of samples. We use the fact that OPT > (see Lemma [23] in the appendix) and 
therefore we can achieve OPT/poly{n) additive error in polynomial time. 

We also relax the local step condition: we move to the next solution only if i^(x+(5v) > F(x) + poiy{n) OPT 
for a suitable polynomial in n. This way, we can only make a polynomial number of steps. When we terminate, 
the local optimality conditions (Lemma[9|) are satisfied within an additive error of OPT/poly{n), which yields 
a polynomially small error in the approximation bound. 

4.2 Matroid base constraint 

Let us move on to the problem max{/(5') : S G B} where B are the bases of a matroid. For a fixed t S [0, 1], 
let us consider an algorithm which can be seen as local search inside the base polytope B{A4), further 
constrained by the box [0,t]^. The matroid base polytope is defined as B{M) = convjls : B e B} or 
equivalently [7| as B{A4) = {x > : VS"; ^^^^ a;^ < r_M{S),J2iex ■'^i ~ '^M[^y^^ where tm is the matroid 
rank function of M . Finally, we define 

Bt{M) = B{M) n [0, t]^ = {x e B{M) ; Vi; x, < t}. 

Observe that Bt{A4) is nonempty if and only if there is a convex linear combination x ~ X^see^sls such 
that Xi e [0, t] for all i. This is equivalent to saying that there is a linear combination x' = X^ses ^'b^b such 
that Xi e [0, 1] and ~ l' other words the fractional base packing number is i' > j. Since the optimal 

fractional packing of bases can be found efhciently [25^, we can find efficiently the minimum i e [51 1] such 
that Bt{A4) 0. Then, our algorithm is the following. 

Fractional local search in Bt{A4) 
(given t = ^, r < q integer) 

1. Let (5 = i. Assume that x e Bt{M); adjust x (using pipage rounding) so that each Xi is an integer 
multiple of 6. In the following, this property will be maintained. 

2. If there is a direction v = ej — e^ such that x + (5v e Bt {A4) and F{x + 6v) > F{x), then set x := x + 5v 
and repeat. 

3. If there is no such direction v, apply pipage rounding to x and return the resulting solution. 

Notes. We remark that the starting point can be found as a convex linear combination of q bases, x = 
- Xi=i Isii such that no element appears in more than r of them, using matroid union techniques [25] . In 
the algorithm, we maintain this representation. The local search step corresponds to switching a pair of 
elements in one base, under the condition that no element is used in more than r bases at the same time. 
For now, we ignore the issues of estimating -F(x) and stopping the local search within polynomial time. We 
discuss this at the end of this section. 

Finally, we use pipage rounding to convert the fractional solution x into an integral one of value at least 
F(x) (Lemma I20p. Note that it is not necessarily true that any of the bases in a convex linear combination 
X = X^filfi achieves the value F{x). 

Theorem 6. If there is a fractional packing of v G [^t'A bases in Ai, then the fractional local search algorithm 
with t — ^ returns a solution of value at least i(l — <) OPT. 

For example, assume that A4 contains two disjoint bases Bi,B2 (which is the case considered in [16J. 
Then, the algorithm can be used with t = ^ and and we obtain a (-j — o(l))-approximation, improving 
the (^ — o(l))-approximation from |16j . If there is a fractional packing of more than 2 bases, our analysis 
still gives only a (^ — o(l))-approximation. If the dual matroid A4* admits a better fractional packing 
of bases, we can consider the problem max{/(5) : S £ B*} which is equivalent. For a uniform matroid, 
B = {B : \B\ = k}, the fractional base packing number is either at least 2 or the same holds for the dual 
matroid, B* = {B : \B\ = n — k} (as noted in [E]). Therefore, we get a (| — o(l))-approximation for any 
uniform matroid. The value t = 1 can be used for any matroid, but it does not yield any approximation 
guarantee. 
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Analysis of the algorithm. We turn to the properties of fractional local optima. We will prove that 
the point x found by the fractional local search algorithm satisfies the following conditions that allow us to 
compare F(x) to the actual optimum. 

Lemma 12. The outcome of the fractional local search algorithm is a "fractional local optimum" in the 
following sense. 

• For any i,j such that x + S{ej — e^) G Bt{A4), 

dF dF 
dxj dxi ~ 

(The partial derivatives are evaluated at n.) 

Proof. Observe that the coordinates of x are always integer multiples of 5, therefore if it is possible to move 
from X in the direction of v = e^ — e^ by any nonzero amount, then it is possible to move by dv. We use the 
property that for any direction v = e^ — e^, the function _F(x + Av) is a convex function of A [3]. Therefore, 
if ^ > and it is possible to move in the direction of v, we would get i^(x + ^v) > F{k) and the fractional 
local search would continue. For v = e^ — e.;, we get 

dF _ dF 9F 
dX dxj dxi ~ 

□ 

In the following, we refer to the following exchange property for matroid bases (see [25], Corollary 39.21a). 

Lemma 13. For any Bi,B2 G B, there is a bijection tt : Bi \ B2 ^ B2 \ Bi such that Vi G -Bi \ B2; 
Bi- i + 7r(z) G M. 

Using this, we prove a lemma about fractional local optima analogous to Lemma 2.2 in [16] . 

Lemma 14. Let x be the outcome of fractional local search over Bt{A4). Let C £ B be any base. Then there 
is c £ [0, 1]'''" satisfying 

• Ci = t, if i (z C and Xi = t 

• Ci ~ 1, if i Cz C and Xi < t 

• < Ci < Xi, if i ^ C 

such that 

2i^(x) > F{-K V c) + F{-K A c). 

Note that for t = 1, we can set c = Ic. However, in general we need this more complicated formulation. 
Intuitively, c is obtained from x by raising the variables Xi,i G C and decreasing Xi for i ^ C. However, 
we can only raise the variables Xi,i G C, where Xi is below the threshold t, otherwise we do not have any 
information about Also, we do not necessarily decrease all the variables outside of C to zero. 

Proof. Let C G B and assume x G Bt{Ai) is a fractional local optimum. We can decompose x into a convex 
linear combination of vertices of B{Ai), x = ^^bIb. By Lemma [T51 for each base B there is a bijection 
7TB : B\C ^ C\B such that yieB\C; B~i + 713(1) € B. 

We define C = {i £ C : Xi < t}. The reason we consider C" is that if Xi = t, there is no room for an 
exchange step increasing Xi, and therefore Lemma [12] does not give any information about We construct 
the vector c by starting from x, and for each B swapping the elements in _B \ C for their image under tt^, 
provided it is in C", until we raise the coordinates on C" to Cj = 1. Formally, we set Ci — 1 for i E C , Ci = t 
for i G C\C' , and for each i ^ C, we define 

Ct = Xi - ^ ^B- 

B:ieB,7rB(i)eC' 
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In the following, all partial derivatives are evaluated at x. By the smooth submodularity of F{x) (see 

a), 

f)T^ r)T^ r)f^ 

F(xVc)-F(x) < Y: fe--.)9^=E(l--.)a^=E E ^Bg^ (6) 
j-cj>xj ^ jec ^ B jeC'\B ^ 

because X^s j^s Cs = 1 ^ for ^-i^Y J- the other hand, also by smooth submodularity, 
F(x)-F(xAc) > ^ (a;,-c,)^=^(x,-c0|| = 5] E 

i:Ci<Xi * i^C * i^C B:ieB,lTB[i)&C' ' 

using our definition of a. In the last sum, for any nonzero contribution, we have > 0, i € i? and 
j — ■KB{i) G C", i.e. < t. Therefore it is possible to move in the direction — (we can switch from B 
to B - i+j). By Lemma [T^ 

OF dF 

< 0. 



dxj dxi 



Therefore, we get 



nx)-F(xAc) > E ^-S^-E E (7) 

i^C B:ieB,j=TTB(i)eC' ^ B ieB\C:j=irB{i)€C' ^ 

By the bijective property of ttb, this is equal to '^B^jeC'\B^B§§-- Putting ([6]) and ([7]) together, we get 
F(x V c) - i^(x) < i^(x) - i^(x Ac). ' □ 

Now we are ready to prove Theorem [6l 

Proof. Assuming that Bt{A4) 7^ 0, wc can find a starting point xg G Bt{A4). From this point, we reach a 
fractional local optimum x e Bt{M) (see Lemma fT2|) . We want to compare F{x.) to the actual optimum; 
assume that OPT = f{C). 

As before, we define C ^ {i E C : xi < t}. By Lemma fT4l we know that the fractional local optimum 
satisfies: 

2F(x) > F(xVc) +F(xAc) (8) 

for some vector c such that Ci — t for alH e C \ C", q = 1 for i e C" and < Ci < Xi for i ^ C. 
First, let's analyze F(xV c). We have 

• (x V c)j = 1 for all i G C. 

• (x V c), = t for alH e C \ C". 

• (x V c)i < t for all i ^ C. 

We apply Lemma [T51 to the partition X = C U C. We get 

F(xVc) > E[/((Ti(x Vc) nC) U (T2(xVc) n C))] 

where ri(x) and T'2(x) are independent threshold sets. Based on the information above, Ti{x\/ c) C] C = C 
with probability t and Ti(x V c) n C = C" otherwise. On the other hand, T2(x V c) n C = with probability 
at least 1 — t. These two events are independent. We conclude that on the right-hand side, we get /(C) with 
probability at least t{l — t), or /(C) with probability at least (1 — t)^: 

i^(x V c) > <(1 - t)fiC) + (1 - t)V(C')- (9) 

Turning to F{x Ac), we see that 

• (x A c)i — Xi for all i G C". 

• (x A c), = t for alH e C \ C. 
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• (x A c)i < t for all i ^ C. 

We apply Lemma [191 to X = C\JC. 

F(xAc) > E[/((Ti(xAc)nC)U (T2(xAc)nC'))]. 

With probability t, ri(x A c) n C contains C\C' (and maybe some elements of C"). In this case, f{Ti{x A 
c) n C) > /(C) — f{C') by submodularity. Also, T2(x A c) n (7 is empty with probability at least l~t. Again, 
these two events are independent. Therefore, F(x A c) > t{l - t){f{C) - f{C')). If /(C) > /(C), this bound 
is vacuous; otherwise, we can replace t{l — t) by (1 — t)^, because t > 1/2. In any case, 

F(xAc)>(l-t)2(/(C)-/(C')). (10) 

Combining © and PU)) . 

F(x) > i(F(x V c) + ^^(x A c)) > i(i(l - t)fiC) + (1 - t)2/(C)) = 1(1 - i)/(C). 

□ 



Technical remarks. Again, we have to deal with the issues of estimating F{x.) and stopping the local 
search in polynomial time. We do this exactly as we did at the end of Section 14.11 One issue to be careful 
about here is that if / : 2^ — [0,-M], our estimates of F(x) are within an additive error of M/poly{n). 
If the optimum value OPT = max{/(5) : G ;B} is very small compared to M, the error might be large 
compared to OPT which would be a problem. The optimum could in fact be very small in general. But it 
holds that if Ai contains no loops and co- loops (which can be eliminated easily), then OPT > -^M (see 
Appendix [C)) . Then, our sampling errors are on the order of OPT /poly{n) which yields a l/poly(n) error in 
the approximation bound. 



5 Approximation for symmetric instances 

We can achieve a better approximation assuming that the instance exhibits a certain symmetry. This is 
the same kind of symmetry that we use in our hardness construction (Section [3|) and the hard instances 
exhibit the same symmetry as well. It turns out that our approximation in this case matches the hardness 
threshold up to lower order terms. Hence, we can say that the case of symmetric instances is now completely 
understood. 

Similar to our hardness result, the symmetries that we consider here are permutations of the ground set 
X, corresponding to permutations of coordinates in R^. We start with some basic properties which are 
helpful in analyzing symmetric instances. 

Lemma 15. Assume that / : 2^^ — > R is invariant with respect to a group of permutations Q and -F(x) = 
E[/(x)]. Then for any symmetrized vector c = Eo-Gg[cr(c)], VF|c is also symmetric w.r.t. Q. I.e., for any 

r(VFU=c) = VF|x=c. 

Proof. Since fiS) is invariant under fj, so is F(x), i.e. F(x) — F(r(x)) for any t £ Q. Differentiating both 
sides at x = c, we get by the chain rule: 



dF 
dxi 



V — 



r(c) dx. 



dx^ 



r(c) dXi 



Here, 



dx 



^ = 1 if T{i) = i, and otherwise. Therefore, 

dF dF 



dXi x=c dxT-l(^i-) x=t(c) 
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Note that t(c) = 'Ea-eg[T{(j{c))] = Ecreg[cr{c)] = c since the distribution of r o cr is equal to the distribution 
of a. Therefore, 



dF 

dxi 



dF 



c dXr-i{i) 

for any t & Q. □ 

Next, we prove that the "symmetric optimum" max{F(x) : x g P(J^)} gives a sohition which is a local 
optimum for the original instance max{_F'(x) : x G P(J^)}. (As we proved in Section[3l in general we cannot 
hope to find a better solution than the symmetric optimum.) 

Lemma 16. Let f : 2^ — > M and J- C 2^ he invariant with respect to a group of permutations Q . Let 
OPT = max{_F(x) : x G P(J^)} where x = Eo-ec;[c(x)], and let Xq he the symmetric point where OPT is 
attained (xq = Xoj. Then xg is a local optimum for the prohlem max{P(x) : x G in the sense that 

(x - xo) • VF|xo < for any x G P{J). 

Proof. Assume for the sake of contradiction that (x — xq) • VF|xo > for some x G P{J^)- We use the 
symmetric properties of / and J- to show that (x — xq) • VF|xo > as well. Recall that xq = xq- We have 

(x - Xo) • VPlxo = E,ee[a(x - xq) • VF|x„] - E,eg[(x - xq) • ^-^(VFlxo)] - (x - xq) • VF|xo > 

using Lemma 1151 Hence, there would be a direction x — Xq along which an improvement can be obtained. 
But then, consider a small 6 > such that Xi = xq + (5(x — Xq) G P{J^) and also P(xi) > F(xo). The point 
Xi is symmetric (xi = xi) and hence it would contradict the assumption that F{xq) = OPT. □ 



5.1 Submodular maximization over independent sets 

Let us derive an optimal approximation result for the problem max{/(S') : 5 G 1} under the assumption 
that the instance is " element-transitive" . This means that there is a group of permutations G such that the 
orbit of any element is the entire ground set X , and our instance is invariant under Q. Then, we show that 
it is easy to achieve an optimal (5 ~ o(l))-approximation. 

Theorem 7. Let max{/(S') : S" G 1} be an instance symmetric with respect to an element-transitive group 
of permutations Q. Let OPT = max{P(x) : x G P{M)} where x = E^eg;icr(x)]. Then OPT > \OPT. 

Proof. Let OPT = f{C). By Lemma [TOl OPT — P(xo) where xq is a local optimum for the problem 
max{P(x) : x G P{M.). This means it is also a local optimum in the sense of Lemma [SI with t = 1. By 
Lemma lll[ 

P(xo) > P(xo V Ic) + P(xo A Ic). 

Also, Xq = Xq- As we are dealing with an element-transitive group of symmetries, this means all the 
coordinates of xq are equal, xq = ■ • • 1 0- Therefore, Xq V Ic is equal to 1 on C and ^ outside of C . By 
Lemma [THl 

^^(xoVlc)>(l-0/(C). 
Similarly, xq A Ic is equal to ^ on C and outside of C. By Lemma [HI 

i^(xo A Ic) > C/(C). 

Combining the two bounds, 

2P(xo) > P(xo V Ic) + P(xo A Ic) > (1 - Of{C) + U{C) - f{C) = OPT. 

□ 

Since all symmetric solutions x = (■f , ^, • • • , C) form a 1-parameter family, and P(^, • • • , is a concave 
function, we can search for the best symmetric solution (within any desired accuracy) by binary search. By 
standard techniques, we get the following. 
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Corollary 2. There is a — o{l))- approximation ("brute force" search over symmetric solutions) for the 
problem max{/(S') : S £ 1} for instances symmetric under an element-transitive group of permutations. 

The hard instances for submodular maximization subject to a matroid independence constraint correspond 
to refinements of the Max Cut instance for the graph if 2 (Section [2]). It is easy to see that such instances 
are element-transitive, and it follows from Section [3] that a + e)-approximation for such instances would 
require exponentially many value queries. Therefore, our approximation for element-transitive instances is 
optimal. 

5.2 Submodular maximization over bases 

Let us come back to the problem of submodular maximization over the bases of matroid. The property that 
OPT is a local optimum with respect to the original problem max{i^(x) : x S P(J^)} is very useful in arguing 
about the value of OPT. We already have tools to deal with local optima from Section 14.21 Here we prove 
the following. 

Lemma 17. Let B(Ai) be the matroid base polytope of Ai and xq G B{M) a local maximum for the 
submodular maximization problem max{i^(x) : x G B{Aiy\ , in the sense that (x — Xq) • Vi^jxo 5: for any 
X G B{M). Assume in addition that xq G [s,t]^ . Then 

F(xo) > 1(1 -t + 5) - OPT. 

Proof. Let OPT — max{/(i?) ; i? G S} = /(C). We assume that xo G B{M) is a local optimum with 
respect to any direction x — xq, x G i?(A^), so it is also a local optimum with respect to the fractional local 
search in the sense of Lemma [TH with t = 1. The lemma implies that 

2P(xo) >P(xVlc)+F(xAlc). 

By assumption, the coordinates of x V Ic are equal to 1 on C and at most t outside of C. With probability 
1 — i, a random threshold in [0, 1] falls between t and 1, and Lemma fTSl implies that 

P(x Vic) >(!-<)• /(C). 

Similarly, the coordinates of xA Ic are outside of C, and at least s on C. A random threshold falls between 
and s with probability s, and Lemma [TSl implies that 

F(xAlc)>s./(C). 

Putting these inequalities together, we get 

2P(xo) > P(x V Ic) + F(x Mc)>{l-t + s)- f{C). 

□ 

Totally symmetric instances. The application we have in mind here is a special case of submodular 
maximization over the bases of a matroid, which we call totally symmetric. 

Definition 5. We call an instance max{/(5) : S G J-} totally symmetric with respect to a group of per- 
mutations Q , if both f{S) and T are invariant under Q and moreover, there is a point c G P{J') such that 
c = X = Eo-gg;[cr(x)] for every x G P{J^). We call c the center of the instance. 

Note that this is indeed stronger than just being invariant under Q. For example, an instance on a ground 
set X — Xi U X2 could be symmetric with respect to any permutation of Xi and any permutation of X2. 
For any x G P{T), the symmetric vector x is constant on Xi and constant on X2. However, in a totally 
symmetric instance, there should be a unique symmetric point. 
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Bases of partition matroids. A canonical example of a totally symmetric instance is as follows. Let 

X = XiU X2 U . . . U Xm and let integers fci, . . . , fc^ be given. This defines a partition matroid Ai = (X, B), 
whose bases are 

B = {B:yj;\BnXj\^k,}. 
The associated matroid base polytope is 

B{M) = {x > : Vj; ^ x, = kj}. 

This matroid is invariant under any group of permutations Q which maps each Xj to itself. In particular, 
assume that the orbit of each element i G Xj is the entire part Xj. This implies that for any x € B{A4), 
X is the same vector, with coordinates kj/\Xj\ on Xj. If f{S) is also invariant under Q, we have a totally 
symmetric instance max{/(S') : S G B}. The center point can be found by taking any feasible solution 
and symmetrizing it w.r.t. Q. We show that for such instances, the center point achieves an improved 
approximation. 

Theorem 8. Let max{/(S') : S G B} be a totally symmetric instance. Let the fractional packing number of 
bases be v and the fractional packing number of dual bases v* . Then the center point c satisfies 

Recall that in the general case, we get a ^(1 — l/i^ — o(l))-approximation (TheoremlS]). By passing to the 
dual matroid, we can also obtain a i(l — — o(l))-approximation, so in general, we know how to achieve 
a ^(1 — 1/ maxji/, f*} — o(l))-approximation. For totally symmetric instances where v = v* , we improve this 
to the optimal factor of 1 — l/iy. 

Proof. Since there is a unique center c = x for any x G B{A4), this means this is also the symmetric 
optimum F{c) — max{F(x) : x G B{Ai)}. Due to Lemma [1^1 c is a local optimum for the problem 
max{F(x) : x G B{M)}. 

Because the fractional packing number of bases is v, we have Ci < l/v for all i. Similarly, because the 
fractional packing number of dual bases (complements of bases) is i^*, we have 1 — Ci < l/i^*. This means 
that c G [1 — Lemma [17] implies that 

2F(c) > (^1 - 1 + 1 - 1) OPT. 

□ 

Corollary 3. Let max{/(S') : S G B} be an instance on a partition matroid where every base takes at least 
an a-fraction of each part, at most a (1 — a)-fraction of each part, and the submodular function f{S) is 
invariant under a group Q where the orbit of each i G Xj is Xj. Then, the center point c = 'Ei„^g[a{\B)\ 
(equal for any B Cz B) satisfies F(c) > a ■ OPT. 

Proof. If the orbit of any element i G Xj is the entire set Xj, it also means that a{i) for a random a E Q 
is uniformly distributed over Xj (by the transitive property of Q). Therefore, symmetrizing any fractional 
vector X G B{Ai) gives the same vector x = c, where Ci = kj/\Xj\ for i G Xj. Also, our assumptions mean 
that the fractional packing number of bases is 1/(1 — a), and the fractional packing number of dual bases is 
also 1/(1 — a). Due to Lemma[8l the center c satisfies F{c) > a ■ OPT. □ 

The hard instances for submodular maximization over matroid bases that we describe in Section [2] are 
exactly of this form (see the last paragraph of Section [21 with a = 1/fc). There is a unique symmetric 
solution, X = (a, a, . . . , a, 1 — a, 1 — a, . . . , 1 — a). The fractional base packing number for these matroids is 
v = 1/(1 — a) and Theorem [3] implies that any (a + e) = (1 — 1/;/ + e)-approximation for such matroids would 
require exponentially many value queries. Therefore, our approximation in this special case is optimal. 
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A Submodular functions and their extensions 

In this appendix, we present a few basic facts concerning siibmodular functions f{S) and their continuous 
extensions. By fA{S), we denote the marginal value of S w.r.t. A, fA{S) = f{A US) — f{A). We also use 
/a(*) = f{A + i) — f{A). The notation A + i is shorthand for A U {i}. Similarly, we write A — i to denote 
A\{^}. 

Definition 6. The multilinear extension of a function f : 2^ IS. is a function F : [0, 1]^ — )■ M. where 

f(x)= ^/(5)n^.n(i-^i)- 

sex ies j^s 

Definition 7. The Lovdsz extension of a function / : 2^ — >■ M is a function F : [0, 1]^ — >■ M such that 

n 

F(x) = ^(x,(i) - x,(,+i))/({7r(j) ■.l<j<i}) 
where w : [n] ^ X is a bisection such that a;^(i) > a;^(2) > ••• > we interpret a;^(o), a;y(n+i) 

X-KiO) = l)a;,r(n+l) = 0. 

A useful way to view the multilinear extension is that we sample a random set i?(x), where each element 
i appears independently with probability Xj, and we take -F(x) = E[/(J?(x))]. The Lovasz extension can be 
viewed similarly, by sampling a random set in a correlated fashion. 
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Definition 8. For a vector x £ [0,1]'''^, we define the "random threshold set" T(x) by taking a uniformly 
random A G [0, 1] , and setting 

r(x) ^ {i e X : X, > X}. 

Assuming that xi > X2 > ■ ■ ■ > xq = 1, Xn — 0, it is easy to see tliat tlie Lovasz extension of f{S) is 
equal to 

n 

F(x) - ^(x. - x,+,)fm = E[/(T(x))]. 

1=0 

It is known that the Lovasz extension of a submodular function is always convex |18) . which is not true for 
the multilinear extension. We prove that the Lovasz extension is always upper-bounded by the multilinear 
extension; this lemma appears quite basic but it has not been published to our knowledge. 

Lemma 18. Let F{x.) denote the multilinear extension and let i^(x) denote the Lovasz extension of a sub- 
modular function f : 2^ — > M. Then 

F(yi) > F(x). 

Proof. Let -R(x) be a random set where elements are sampled independently with probabilities Xi, and let 
T(x) be the random threshold set. Le., we want to prove E[/(i?(x))] > E[/(r(x))]. We can assume WLOG 
that the elements are ordered so that xi > X2 > ■ ■ ■ > Xn- We also let xq = 1 and Xn+i = 0. Then, 



E[/(i?(x))] = /(0) + ^E[/(i?(x)n[fc])-/(i?(x)n[fc-i])] 

n 

= /(0) + E[/fl,(x)n[fc-i] (i?(x) n {k})] 



k=l 

and by submodularity, 



E[/(i?(x))] > /(0)+^E[/[,_i](i?(x)n{fc})] 

n 

= f{$)+Y.xuf[u-i]{k) 

71 

= fm+Y.xu{f{[k])-f{[k-i])) 



k=l 



^(xfc - Xk+i)f{[k]) 



k=0 



E[/(T(x))] 



□ 



A refinement of this lemma says that we can also consider a partition of the ground set and apply an 
independent threshold set on each part. This gives a certain hybrid between the multilinear and Lovasz 
extensions. 

Lemma 19. For any partition X = XiLi X2, 

F(x) > E[/((ri(x) n Xi) u (r2(x) n X2))] 

where T'i(x) and T2(x) are two independently random threshold sets for x. 

Intuitively, sampling from Y independently with probability p is not worse than taking all of Y with 
probability p or none of it with probability 1 — p. This follows from statements proved in |10| , but we give a 
proof here for completeness. 
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Proof. i^(x) = E[/(i?(x))] where i?(x) is sampled independently with probabilities Xi. Let's condition on 
i?(x) n X2 = -R2- This restricts the remaining randomness to Xi, and we get /(i?(x)) = /(i?2) + 5(-R(x)), 
where g{S) = fn^iS Xi) is again submodular. By Lemma [TSl E[g(i?(x))] > E[5(Ti(x))] where Ti(x) is a 
random threshold set. Hence, 

E[/(i?(x)) I i?(x) nX2 = R2] = f{R2) + E[.g(i?(x))] 

> /(i?2) + E[5(Ti(x))] = E[/(i?2 U (Ti(x) n Xi))]. 
By randomizing R2 — i?(x) n X2, we get 

E[/(i?(x))] > E[/((i?(x) n X2) U (Ti(x) n Xi))]. 

Repeating the same process, conditioning on Ti(x) and applying Lemma [T51 to i?(x) H X2, we get 

E[/(i?(x))] > E[/((T2(x) n X2) U (Ti(x) n Xi))]. 

□ 



B Pipage rounding for non-monotone submodular functions 

The pipage rounding technique [HISlll] starts with a point in the base polytope y € B[M) and produces an 
integral solution S S I (in fact, a base) of expected value E[/(S')] > F{y). We recall the procedure here, in 
its randomized form [4]. 

Subroutine HitConstraint(y, i, j): 
Denote A = {AC X : i € A,j ^ A); 
Find 5 = minAe>t(''x (-4) - v{A)) 

and an optimal A ^ A; 
If yj < S then {S ^ yj, A ^ {j}}; 
yi ^yt + S, y-j ^ yj - S; 
Return {y,A). 

Algorithm PipageRound(A^, y): 
While (y is not integral) do 

While (T contains fractional variables) do 
Pick i,j £ T fractional; 
(y+,A+) ^ HitConstraint(y,i,j); 
{y^,A^) ^ HitConstraint(y, j, i); 

l|y+ -y||/l|y+ -y^ll; 

With probability p, {y y", T ^ T n A"}; 

Else {y .^y+, r^TnA+}; 

EndWhile 
EndWhile 
Output y. 

The application in [3^ was to monotone submodular functions, but as the authors mention, monotonicity 
is not used anywhere in the analysis. The technique as described in [4] yields the following. 

Lemma 20. The pipage rounding technique, given a membership oracle for a matroid M — {X,T), a value 
oracle for a submodular function f : 2^ ^ R+, and y in the base polytope B{A4), returns a random base B 
of value -E[f{B)]>Fiy) . 

Monotonicity is used in 3^ only to argue that a fractional solution can be assumed to lie in the base 
polytope without loss of generality. Therefore, if we are working with the base polytope (as in Section IT2|) . 
we can use the pipage rounding technique without any modification. 
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If we are working with non-monotone submodular functions and the matroid polytope, we have to do 
some additional adjustments to make sure that we do not lose anything when rounding a fractional solution. 
We proceed as follows. The following procedure takes a point x € P{Ai) and while there is a fractional 
coordinate, it cither pushes it to its maximum possible value, or makes it zero and removes it from the 
problem. 

Algorithm Adjust (A^, x): 
While (x ^ do 
If (there is i and S > such that x + dei e PiM)) do 
Let Xmax ~ Xi + max{^ : x + Soi e P{A4)}; 
Let p — Xi / x^dx : 

With probability p, {xi ^ Xmax}] 
Else {xi 0}; 

Endlf 

If (there is i such that = 0) do 
Delete i from M and remove the j-coordinate from x. 
EndWhilc 
Output (A^,x). 

Lemma 21. Given x e P{A4) and a submodular function f{S) with its extension -F(x), the procedure 
AdjustflM,xj yields a restricted matroid M.' and a point y G B{M.') such that E[F(y)] > -F(x). 

Proof. If X G P(A^), there is always a point z e B{M) dominating x in every coordinate (see Corollary 
40. 2h in Therefore, if x ^ B{M.), there is a coordinate which can be increased while staying in P{M). 

In each step, when we choose randomly between increasing and decreasing Xi, the objective function is linear 
and the expectation of F(x) is preserved. Hence, the process forms a martingale. At the end, E[F(y)] is 
equal to the initial value F(x). 

As long as we increase variables, each variable is increased to its maximum value and cannot be increased 
twice. Therefore, after at most n steps we either reach a point in B{M) or make some variable Xi equal to 
0. A variable equal to is removed, which can be repeated at most n times. Hence, the process terminates 
in O(n^) time. □ 

For a given x g P(7W), we run (A^',y) := Adjust(Al, x), followed by PipageRound(7M', y). The 
outcome is a base in the restricted matroid where some elements have been deleted, i.e. an independent 
set in the original matroid. We call this the extended pipage rounding procedure. Lemma [21] together with 
Lemma [501 gives the following. 

Lemma 22. The extended pipage rounding procedure, given a membership oracle for a matroid A4 — (A,Z), 
a value oracle for a submodular function f : 2^ — >■ R+, and x in the matroid polytope P{A4), yields a random 
independent set 5 G X o/ value E[/(S')] > -F(x). 

C Solutions of non-trivial value 

In this section, we prove that solutions of non-trivial value always exist for the problems of maximizing 
a non-monotone submodular function subject to a matroid independence or matroid base constraint. We 
can assume that our matroid does not contain any loops (which can be removed beforehand), and in the 
case of a matroid base constraint no co-loops either (since co-loops participate in every solution and can be 
contracted). The bounds that we prove here are needed to ensure that our sampling errors can be made 
negligible compared to the value of the optimum. 

Lemma 23. Let M = (A, I) be a matroid without loops (elements which are never in an independent set), 
\X\ — n, and let / : 2"^ — > M-|_ be a (non-monotone) submodular function such that maxscx f{S) — M. Let 
OPT = max{/(/) -.1 el}. Then 

OPT >-M. 

n 
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Proof. Wc consider only solutions of size \I\ < 1, which are independent because our matroid does not contain 
any loops. Let /(/*) = max{/(/) : |/| < 1}. By submodularity and nonnegativity of /, the value of any 
non-empty set can be estimated as 

/(5)<^/({j})<n /(/*). 

By definition, we also have /(0) < /(/*). Therefore, M = max/(S') < n /(/*). □ 

Lemma 24. Let M. = be a matroid without loops (elements which are never in a base) and coloops 

(elements which are in every base), \X\ = n and let B denote the bases of M.. Let f : 2^ ^ R+ be a 
(non-monotone) submodular function such that maxgcx f{S) = M. Let OPT = max{/(B) : B e B}. Then 

OPT > 

Proof. Let us find B = . . . , 6fe} greedily, by including in each step an element which has maximum possible 
marginal value f{bi,...,be}{be+i) among all elements that preserve independence. We claim that f{B) > -^M. 

The first element hi is the element of maximum value maxjgx /({*}) (because there are no loops, all 
elements are eligible). Let M = f{S*). By submodularity, 

f{S*) < J2 /({O) < nf{{bi}). 
ieS' 

Therefore, /({6i}) > ^M. However, additional elements can decrease this value. Consider the last element 
bk that we added to B and assume that f{B) < -^M. Due to our greedy procedure and submodularity, the 
marginal values of successive elements keep decreasing, and therefore 

/b\{6.}(6.) < - mi})) < ~M. 

Since there are no coloops in M , adding bk was not our only choice - otherwise, we would have rank(X\{6fc}) < 
rank(X) which means exactly that bk is a coloop. So there is another element 6'^, that would form a base 
{bi,...,bk-i,b'f,}. The reason we did not select 6^ was that fB\{bk}iK) < fB\{bk}{bk) < -^^M. By 
submodularity, if we add both elements, we obtain a set B = {bi, . . . ,bk, b').} such that 

f{B) = f{B) + fsib'k) < f{B) + fB\{b,}{b'k) <^M-^M = 

which is a contradiction with the nonnegativity of /. □ 

We remark that this bound is tight up to a constant factor. Consider a complete bipartite directed graph 
on X = Xi UX2, \Xi\ = 1 = n/2. The submodular function f{S) is the directed cut function, expressing 
the number of arcs from S to S. The matroid A4 has bases that contain exactly one element from Xi and 
n — 1 elements from X2. It is easy to see that any base B has value f{B) = 1, while f{Xi) = 
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